Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for EfficientNet #639

Merged
merged 2 commits into from
Mar 16, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,7 @@ You can refine your search by selecting the task you're interested in (e.g., [te
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
Expand Down
1 change: 1 addition & 0 deletions docs/snippets/6_supported-models.snippet
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
1. **[DiT](https://huggingface.co/docs/transformers/model_doc/dit)** (from Microsoft Research) released with the paper [DiT: Self-supervised Pre-training for Document Image Transformer](https://arxiv.org/abs/2203.02378) by Junlong Li, Yiheng Xu, Tengchao Lv, Lei Cui, Cha Zhang, Furu Wei.
1. **[Donut](https://huggingface.co/docs/transformers/model_doc/donut)** (from NAVER), released together with the paper [OCR-free Document Understanding Transformer](https://arxiv.org/abs/2111.15664) by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park.
1. **[DPT](https://huggingface.co/docs/transformers/master/model_doc/dpt)** (from Intel Labs) released with the paper [Vision Transformers for Dense Prediction](https://arxiv.org/abs/2103.13413) by René Ranftl, Alexey Bochkovskiy, Vladlen Koltun.
1. **[EfficientNet](https://huggingface.co/docs/transformers/model_doc/efficientnet)** (from Google Brain) released with the paper [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks](https://arxiv.org/abs/1905.11946) by Mingxing Tan, Quoc V. Le.
1. **[ELECTRA](https://huggingface.co/docs/transformers/model_doc/electra)** (from Google Research/Stanford University) released with the paper [ELECTRA: Pre-training text encoders as discriminators rather than generators](https://arxiv.org/abs/2003.10555) by Kevin Clark, Minh-Thang Luong, Quoc V. Le, Christopher D. Manning.
1. **[ESM](https://huggingface.co/docs/transformers/model_doc/esm)** (from Meta AI) are transformer protein language models. **ESM-1b** was released with the paper [Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences](https://www.pnas.org/content/118/15/e2016239118) by Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. **ESM-1v** was released with the paper [Language models enable zero-shot prediction of the effects of mutations on protein function](https://doi.org/10.1101/2021.07.09.450648) by Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu and Alexander Rives. **ESM-2 and ESMFold** were released with the paper [Language models of protein sequences at the scale of evolution enable accurate structure prediction](https://doi.org/10.1101/2022.07.20.500902) by Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, Alexander Rives.
1. **[Falcon](https://huggingface.co/docs/transformers/model_doc/falcon)** (from Technology Innovation Institute) by Almazrouei, Ebtesam and Alobeidli, Hamza and Alshamsi, Abdulaziz and Cappelli, Alessandro and Cojocaru, Ruxandra and Debbah, Merouane and Goffinet, Etienne and Heslow, Daniel and Launay, Julien and Malartic, Quentin and Noune, Badreddine and Pannier, Baptiste and Penedo, Guilherme.
Expand Down
25 changes: 25 additions & 0 deletions src/models.js
Original file line number Diff line number Diff line change
Expand Up @@ -5411,6 +5411,29 @@ export class StableLmModel extends StableLmPreTrainedModel { }
export class StableLmForCausalLM extends StableLmPreTrainedModel { }
//////////////////////////////////////////////////


//////////////////////////////////////////////////
export class EfficientNetPreTrainedModel extends PreTrainedModel { }

/**
* The bare EfficientNet model outputting raw features without any specific head on top.
*/
export class EfficientNetModel extends EfficientNetPreTrainedModel { }

/**
* EfficientNet Model with an image classification head on top (a linear layer on top of the pooled features).
*/
export class EfficientNetForImageClassification extends EfficientNetPreTrainedModel {
/**
* @param {any} model_inputs
*/
async _call(model_inputs) {
return new SequenceClassifierOutput(await super._call(model_inputs));
}
}
//////////////////////////////////////////////////


//////////////////////////////////////////////////
// AutoModels, used to simplify construction of PreTrainedModels
// (uses config to instantiate correct class)
Expand Down Expand Up @@ -5532,6 +5555,7 @@ const MODEL_MAPPING_NAMES_ENCODER_ONLY = new Map([
['glpn', ['GLPNModel', GLPNModel]],

['hifigan', ['SpeechT5HifiGan', SpeechT5HifiGan]],
['efficientnet', ['EfficientNetModel', EfficientNetModel]],

]);

Expand Down Expand Up @@ -5706,6 +5730,7 @@ const MODEL_FOR_IMAGE_CLASSIFICATION_MAPPING_NAMES = new Map([
['resnet', ['ResNetForImageClassification', ResNetForImageClassification]],
['swin', ['SwinForImageClassification', SwinForImageClassification]],
['segformer', ['SegformerForImageClassification', SegformerForImageClassification]],
['efficientnet', ['EfficientNetForImageClassification', EfficientNetForImageClassification]],
]);

const MODEL_FOR_OBJECT_DETECTION_MAPPING_NAMES = new Map([
Expand Down
16 changes: 14 additions & 2 deletions src/processors.js
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,7 @@ export class ImageFeatureExtractor extends FeatureExtractor {

for (let i = 0; i < pixelData.length; i += image.channels) {
for (let j = 0; j < image.channels; ++j) {
pixelData[i + j] = (pixelData[i + j] - this.image_mean[j]) / this.image_std[j];
pixelData[i + j] = (pixelData[i + j] - image_mean[j]) / image_std[j];
}
}
}
Expand Down Expand Up @@ -811,6 +811,17 @@ export class ConvNextImageProcessor extends ConvNextFeatureExtractor { } // NOT
export class ViTFeatureExtractor extends ImageFeatureExtractor { }
export class ViTImageProcessor extends ImageFeatureExtractor { }

export class EfficientNetImageProcessor extends ImageFeatureExtractor {
constructor(config) {
super(config);
this.include_top = this.config.include_top ?? true;
if (this.include_top) {
this.image_std = this.image_std.map(x => x * x);
}
}
}


export class MobileViTFeatureExtractor extends ImageFeatureExtractor { }
export class OwlViTFeatureExtractor extends ImageFeatureExtractor {
/** @type {post_process_object_detection} */
Expand All @@ -836,7 +847,7 @@ export class DonutFeatureExtractor extends ImageFeatureExtractor {
image_std = new Array(imageChannels).fill(image_mean);
}

const constant_values = image_mean.map((x, i) => - x / this.image_std[i]);
const constant_values = image_mean.map((x, i) => - x / image_std[i]);

return super.pad_image(pixelData, imgDims, padSize, {
center: true,
Expand Down Expand Up @@ -2132,6 +2143,7 @@ export class AutoProcessor {
YolosFeatureExtractor,
DonutFeatureExtractor,
NougatImageProcessor,
EfficientNetImageProcessor,

ViTImageProcessor,
VitMatteImageProcessor,
Expand Down
17 changes: 17 additions & 0 deletions tests/processors.test.js
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ describe('Processors', () => {
clip: 'openai/clip-vit-base-patch16',
vitmatte: 'hustvl/vitmatte-small-distinctions-646',
dinov2: 'facebook/dinov2-small-imagenet1k-1-layer',
efficientnet: 'google/efficientnet-b0',
}

const TEST_IMAGES = {
Expand Down Expand Up @@ -426,6 +427,22 @@ describe('Processors', () => {
}
}, MAX_TEST_EXECUTION_TIME);

// EfficientNetImageProcessor
// - tests include_top
it(MODELS.efficientnet, async () => {
const processor = await AutoProcessor.from_pretrained(MODELS.efficientnet)

{
const image = await load_image(TEST_IMAGES.cats);
const { pixel_values, original_sizes, reshaped_input_sizes } = await processor(image);

compare(pixel_values.dims, [1, 3, 224, 224]);
compare(avg(pixel_values.data), 0.3015307230282871);

compare(original_sizes, [[480, 640]]);
compare(reshaped_input_sizes, [[224, 224]]);
}
}, MAX_TEST_EXECUTION_TIME);
});

describe('Audio processors', () => {
Expand Down
Loading