Description
Problem Description
After fine-tuning and modifying the structure of the MobileBERT model, I successfully exported it as a .pte
file using the XNNPACK backend. When tested with xnn_executor_runner
, the model produced the expected output shape [1, 12]
. However, when invoked in the Android application, the model still outputs the original MobileBERT format (e.g., [1, 512]
). There is an inconsistency between the desktop and mobile environments.
The Android application is modified based on ExecutorchDemo
.
Steps to Reproduce
-
Model Modification and Export
- Modified the classification head of MobileBERT (e.g., adjusted the output dimension to 12 classes).
- Used the
aot_compiler
toolchain to export the model asmobilebert_xnnpack_fp32.pte
. - Verified the output format using the following command:
./cmake-out/backends/xnnpack/xnn_executor_runner --model_path=mobilebert_xnnpack_fp32.pte # Output 0: tensor(sizes=[1, 12])
-
Android Integration
- Added the
.pte
file to theassets
directory of the Android project. - Loaded the model using standard JNI calls:
// Load model mModule = Module.load(assetFilePath(this, MODEL_FILE)); outputTensor = mModule.forward( new EValue[]{ EValue.from(inputs[0]), // input_ids EValue.from(inputs[1]), // attention_mask } )[1].toTensor();
- During inference, the output shape remains the original
[1, 512]
.
- Added the
Expected Behavior
The Android application should produce an output shape of [1, 12]
, consistent with the desktop xnn_executor_runner
results.
Actual Behavior
The Android output retains the original model shape [1, 512]
, indicating that the modification has not taken effect.