Hybrid convolutional vision transformer for extrusion-based 3D food-printing defect classification
10.11591/ijai.v14.i4.pp3311-3323
Cholid Mawardi
,
Agus Buono
,
Karlisa Priandana
,
Herianto Herianto
Deep learning is generally used to perform remote monitoring of three-dimensional (3D) printing results, including extrusion-based 3D food printing. One of the widely used deep learning algorithms for defect detection in 3D printing is the convolutional neural network (CNN). However, the process requires high computational costs and a large dataset. This research proposes the Con4ViT model, a hybrid model that combines the strengths of vision transformer with the inherent feature extraction capabilities of CNN. The locally extracted features in the CNN were merged using the transformers’ global features with four transformer encoder blocks. The proposed model has a smaller number of parameters compared to other lightweight pre-trained deep learning models such as VGG16, VGG19, EfficientNetB2, InceptionV3, and ResNet50. Thus, the proposed model is simplified. Simulations were conducted to classify defect and non-defect images obtained from the printing results of a developed extrusion-based 3D food printing device. Simulation results showed that the model produced an accuracy of 95.43%, higher than the state-of-the-art techniques, i.e., VGG16, VGG19, MobileNetV2, EfficientNetB2, InceptionV3, and ResNet50, with accuracies of 77.88, 86.30, 82.95, 90.87, 84.62, and 93.83%, respectively. This research shows that the proposed Con4ViT model can be used for 3D food printing defect detection with high accuracy.