SpiceVision: A Dual-Attention EfficientNetB3 Framework with Multi-Scale CBAM Feature Fusion for Fine-Grained Spice Image Classification

Main Article Content

Deep Kamle
Dr. Satyendra Sharma
https://orcid.org/0000-0001-7155-3313
Dr. Hemang Shrivastava

Abstract

Visual classification of raw spices is essential for ensuring food quality and safety, automated sorting in retail settings, culinary education and detection of economic adulteration, but is difficult due to their high similarities in terms of color, granularity, and texture attributes. This work introduces SpiceVision, a light-weight hybrid deep learning framework integrates a pretrained ImageNet EfficientnetB3 backbone network (consisting of its first six MBConv blocks), two Convolutional Black Attention Modules (CBAM) added to two intermediate stages and merging them via an upsample-fusion multi-scale fusion module followed by a classification head using dual global pooling. The input images are preprocessed using a four-step procedure comprising resizing to 128×128. Contrast-Limited Adaptive Histogram Equalisation (CLAHE), and per-channel normalization learned by the model itself. The network was trained with a carefully selected data set consisting of 17,452 images for 20 different spices (12,215 training / 2,619 validation / 2,618 test images, data split 70% / 15% / 15%) via a two-step transfer-learning process of 20-epoches warn-up and cosine decay followed by 30-epoches fine-tuning with low learning rate where only the attention, fusion and classification layers were updated (765,032 out of 5,943,033 parameters, or 12.9%) while keeping EfficientNetB3 completely frozen. With the trained model, 98.62% test accuracy, macro F1-score 98.70%, weighted F1-score 98.62%, and macro one-vs-rest ROC-AUC 0.9999 is achieved. In per-class evaluation, each of the 20 different spices gets a precision, recall, and F1-score greater than 0.94, and the remaining small amount of confusion is almost all within several pairs of texturally similar seed and powder types like cloves vs. cumin seed and coriander vs. coriander seeds.

Citations

Downloads

Download data is not yet available.

Article Details

Section

Research Articles

How to Cite

Kamle, D., Sharma, D. S., & Shrivastava, D. H. (2026). SpiceVision: A Dual-Attention EfficientNetB3 Framework with Multi-Scale CBAM Feature Fusion for Fine-Grained Spice Image Classification. International Journal of IoT, Embedded Systems and Industrial Automation, 1(2), e001. https://doi.org/10.66261/nz54mw97

References

[1] Alighaleh, P., Pakdel, R., Ghanei Ghooshkhaneh, N., Einafshar, S., Rohani, A., & Saeidirad, M. H. (2023). Detection and classification of saffron adulterants by Vis-Nir imaging, chemical analysis, and soft computing. Foods, 12(11), 2192.

[2] Balakrishnan, S. B., Padmanaban, P., & Malvannan, L. (2026). PiperNet: a hybrid deep learning approach for monitoring papaya seed adulteration in black pepper using hyperspectral imaging. Food Additives & Contaminants: Part A, 43(1), 15-31.

[3] Nargesi, M. H., & Kheiralipour, K. (2024). Visible feature engineering to detect fraud in black and red peppers. Scientific Reports, 14(1), 25417.

[4] Djoulde, K., Ousman, B., Hamadjam, A., Bitjoka, L., & Tchiegang, C. (2024). Classification of pepper seeds by machine learning using color filter array images. Journal of Imaging, 10(2), 41.

[5] Sabanci, K., Aslan, M. F., Ropelewska, E., & Unlersen, M. F. (2022). A convolutional neural network‐based comparative study for pepper seed classification: Analysis of selected deep features with support vector machine. Journal of Food Process Engineering, 45(6), e13955.

[6] Pujari, L., Belavatgi, M., Sajjan, M. M., Kamatar, V., Surasura, P., & Ammanagi, N. (2024, September). Spice Vision: Deep Learning Enhanced Spice Classification System. In 2024 IEEE North Karnataka Subsection Flagship International Conference (NKCon) (pp. 1-6). IEEE.

[7] He, G., Yang, S. B., & Wang, Y. Z. (2023). An integrated chemical characterization based on FT-NIR, and GC–MS for the comparative metabolite profiling of 3 species of the genus Amomum. Analytica Chimica Acta, 1280, 341869.

[8] Ha, T. T., Pham, T. N., Thai, T. T., Le, A. T., Mai, T. D., & Chung, Y. S. (2025). Applying Machine Learning for Chili Pepper Phenotyping and Feature Extraction. Smart Agricultural Technology, 101458.

[9] Bezabh, Y. A., Salau, A. O., Abuhayi, B. M., Mussa, A. A., & Ayalew, A. M. (2023). CPD-CCNN: classification of pepper disease using a concatenation of convolutional neural network models. Scientific Reports, 13(1), 15581.

[10] Huang, M., Li, K., Yu, X., & Yang, C. (2024). Research on fine-grained visual classification method based on dual-attention feature complementation. IEEE Access, 12, 192209-192218.

[11] BJ, B. N., KM, A. N., & Raghavendra, V. (2026). YOLO-AVCA-CBAMNet: Attention-driven framework for detection and classification of green pepper maturity stages. MethodsX, 103784.

[12] Liu, P., Liu, J., Li, J., & Huang, G. (2025). A lightweight deep neural network with attention fusion for fine-grained image segmentation in complex scenes. Discover Computing, 28(1), 317.

[13] Zheng, K., Li, W., & Wu, Y. (2025, September). Fine-Grained Image Classification via Lightweight Multi-Scale Feature Fusion and Guided Attention. In 2025 International Conference on Computational Intelligence and Robotics (CIR) (pp. 187-192). IEEE.

[14] Guo, T., Wei, Z., Pang, C., Lan, R., Huang, C., & Li, J. (2025, February). Multi-Scale Fusion and Saliency Suppression Network for Fine-Grained Visual Classification. In 2025 13th International Conference on Intelligent Control and Information Processing (ICICIP) (pp. 213-219). IEEE.

[15] Shi, X., Liu, L., Bao, X., Pan, B., & Hussain, S. (2025). Dynamic gated fusion network with hierarchical multi-scale attention for hyperspectral image classification. Scientific Reports, 15(1), 44289.

[16] Gong, G., Wang, X., Zhang, J., Shang, X., Pan, Z., Li, Z., & Zhang, J. (2025). MSFF: A multi-scale feature fusion convolutional neural network for hyperspectral image classification. Electronics, 14(4), 797.

[17] Cui, X., Li, H., Liu, L., Wang, S., & Xu, F. (2024). Multi-FusNet: fusion mapping of features for fine-grained image retrieval networks. PeerJ Computer Science, 10, e2025.

[18] Ramos, L. T., & Sappa, A. D. (2026). Multi-encoder ConvNeXt network with smooth attentional feature fusion for multispectral semantic segmentation. Neurocomputing, 133533.

[19] Li, F., Jie, J., & Luo, X. (2026, February). Spatial Attention-Guided Multi-Scale Fusion for EfficientNet-Based Fine-Grained Vegetable Classification. In 2026 14th International Conference on Intelligent Control and Information Processing (ICICIP) (pp. 218-225). IEEE.

[20] Lin, H. (2025, October). AGFPNet: A Fine-Grained Classification Model for Food Images Combining Attention Guidance and Feature Pyramids. In 2025 IEEE 7th International Conference on Civil Aviation Safety and Information Technology (ICCASIT) (pp. 47-54). IEEE.

[21] Chen, Z., Wang, J., & Wang, Y. (2025). Enhancing food image recognition by multi-level fusion and the attention mechanism. Foods, 14(3), 461.

[22] Nargesi, M. H., Parian, J. A., & Kheiralipour, K. (2025). Detection of wheat, chickpea, and sea foam in black pepper using hyperspectral imaging technique. Applied Food Research, 5(1), 101031.

[23] Kapetas, D., Kalogeropoulou, E., Christakakis, P., Klaridopoulos, C., & Pechlivani, E. M. (2025). Comparative Evaluation of AI-Based Multi-Spectral Imaging and PCR-Based Assays for Early Detection of Botrytis cinerea Infection on Pepper Plants. Agriculture, 15(2), 164.