Quick jump to page content
  • Main Navigation
  • Main Content
  • Sidebar

  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  • Register
  • Login
  • Home
  • Current
  • Archives
  • Join As Reviewer
  • Info
  • Announcements
  • Statistics
  • About
    • About the Journal
    • Submissions
    • Editorial Team
    • Privacy Statement
    • Contact
  1. Home
  2. Archives
  3. Vol. 7, No. 4, November 2022
  4. Articles

Issue

Vol. 7, No. 4, November 2022

Issue Published : Nov 30, 2022
Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model

https://doi.org/10.22219/kinetik.v7i4.1568
Yufis Azhar
Universitas Muhammadiyah Malang
M. Randy Anugerah
Universitas Muhammadiyah Malang
Muhammad Al Reza Fahlopy
Universitas Muhammadiyah Malang
Alfin Yusriansyah
Universitas Muhammadiyah Malang

Corresponding Author(s) : Yufis Azhar

yufis@umm.ac.id

Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, Vol. 7, No. 4, November 2022
Article Published : Nov 30, 2022

Share
WA Share on Facebook Share on Twitter Pinterest Email Telegram
  • Abstract
  • Cite
  • References
  • Authors Details

Abstract

Image captioning is one of the biggest challenges in the fields of computer vision and natural language processing. Many other studies have raised the topic of image captioning. However, the evaluation results from other studies are still low. Thus, this study focuses on improving the evaluation results from previous studies. In this study, we used the Flickr8k dataset and the VGG16 Convolutional Neural Networks (CNN) model as an encoder to generate feature extraction from images. Recurrent Neural Network (RNN) uses the Bidirectional Long-Short Term Memory (BiLSTM) method as a decoder. The results of the image feature extraction process in the form of feature vectors are then forwarded to Bidirectional LSTM to produce descriptions that match the input image or visual content. The captions provide information on the object’s name, location, color, size, features of an object, and surroundings. A greedy Search algorithm with Argmax function and Beam-Search algorithm are used to calculate Bilingual Evaluation Understudy (BLEU) scores. The results of the evaluation of the best BLEU scores obtained from this study are the VGG16 model with Bidirectional LSTM using Beam Search with parameter K = 3 and the BLEU-1 score is 0.60593, so this score is superior to previous studies.

Keywords

Image Captioning VGG16 Bidirectional LSTM Hybrid Method
Azhar, Y., Anugerah, M. R., Fahlopy, M. A. R., & Yusriansyah, A. . (2022). Image Captioning using Hybrid of VGG16 and Bidirectional LSTM Model. Kinetik: Game Technology, Information System, Computer Network, Computing, Electronics, and Control, 7(4). https://doi.org/10.22219/kinetik.v7i4.1568
  • ACM
  • ACS
  • APA
  • ABNT
  • Chicago
  • Harvard
  • IEEE
  • MLA
  • Turabian
  • Vancouver
Download Citation
Endnote/Zotero/Mendeley (RIS)
BibTeX
References
  1. J. Zhang, K. Li, Z. Wang, X. Zhao, and Z. Wang, “Visual enhanced gLSTM for image captioning,” Expert Syst. Appl., vol. 184, no. June, p. 115462, 2021, doi: 10.1016/j.eswa.2021.115462.
  2. S. Herdade, A. Kappeler, K. Boakye, and J. Soares, “Image captioning: Transforming objects into words,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.
  3. Z. Deng, Z. Jiang, R. Lan, W. Huang, and X. Luo, “Image captioning using DenseNet network and adaptive attention,” Signal Process. Image Commun., vol. 85, p. 115836, 2020, doi: 10.1016/j.image.2020.115836.
  4. O. Sargar and S. Kinger, “Image captioning methods and metrics,” 2021 Int. Conf. Emerg. Smart Comput. Informatics, ESCI 2021, pp. 522–526, 2021, doi: 10.1109/ESCI50559.2021.9396839.
  5. I. Hrga and M. Ivašic-Kos, “Deep image captioning: An overview,” 2019 42nd Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2019 - Proc., pp. 995–1000, 2019, doi: 10.23919/MIPRO.2019.8756821.
  6. A. Nursikuwagus, R. Munir, and M. L. Khodra, “Image Captioning menurut Scientific Revolution Kuhn dan Popper,” J. Manaj. Inform., vol. 10, no. 2, pp. 110–121, 2020, doi: 10.34010/jamika.v10i2.2630.
  7. Y. Pan, T. Yao, Y. Li, and T. Mei, “X-Linear Attention Networks for Image Captioning,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 10968–10977, 2020, doi: 10.1109/CVPR42600.2020.01098.
  8. M. A. Al-Malla, M. A. Al-Malla, A. Jafar, and N. Ghneim, “Pre-trained CNNs as Feature-Extraction Modules for Image Captioning,” ELCVIA Electron. Lett. Comput. Vis. Image Anal., vol. 21, no. 1, pp. 1–16, 2022, doi: 10.5565/rev/elcvia.1436.
  9. K. Chandhar, C. H. Sandeep, M. Akarapu, K. R. Chythanya, and V. Thirupathi, “Deep learning model for automatic image captioning,” Int. Conf. Res. Sci. Eng. Technol., vol. 2418, no. May, p. 020074, 2022, doi: 10.1063/5.0081847.
  10. A. Kumar, “Image Captioning and Image Retrieval,” vol. 4, no. 4, pp. 909–912, 2019.
  11. H. Hejazi and K. Shaalan, “Deep Learning for Arabic Image Captioning: A Comparative Study of Main Factors and Preprocessing Recommendations,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 11, pp. 37–44, 2021, doi: 10.14569/IJACSA.2021.0121105.
  12. E. Mulyanto, E. I. Setiawan, E. M. Yuniarno, and M. H. Purnomo, “Automatic Indonesian Image Caption Generation using CNN-LSTM Model and FEEH-ID Dataset,” 2019 IEEE Int. Conf. Comput. Intell. Virtual Environ. Meas. Syst. Appl. CIVEMSA 2019 - Proc., 2019, doi: 10.1109/CIVEMSA45640.2019.9071632.
  13. A. A. Nugraha, A. Arifianto, and Suyanto, “Generating image description on Indonesian language using convolutional neural network and gated recurrent unit,” 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, pp. 1–6, 2019, doi: 10.1109/ICoICT.2019.8835370.
  14. C. Wang, H. Yang, and C. Meinel, “Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 14, no. 2s, 2018, doi: 10.1145/3115432.
  15. M. Chohan, A. Khan, M. S. Mahar, S. Hassan, A. Ghafoor, and M. Khan, “Image captioning using deep learning: A systematic literature review,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 5, pp. 278–286, 2020, doi: 10.14569/IJACSA.2020.0110537.
  16. Y. Azhar, M. C. Mustaqim, and A. E. Minarno, “Ensemble convolutional neural network for robust batik classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1077, no. 1, p. 012053, 2021, doi: 10.1088/1757-899x/1077/1/012053.
  17. S. S. Rawat, K. S. Rawat, and R. Nijhawan, “A novel convolutional neural network-gated recurrent unit approach for image captioning,” Proc. 3rd Int. Conf. Smart Syst. Inven. Technol. ICSSIT 2020, no. Icssit, pp. 704–708, 2020, doi: 10.1109/ICSSIT48917.2020.9214109.
  18. S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural Network for Classifying Images,” Int. J. Sci. Res. Publ., vol. 9, no. 10, p. p9420, 2019, doi: 10.29322/ijsrp.9.10.2019.p9420.
  19. B. I. S. L. Nalbalwar, Advances in Intelligent Systems and Computing 810 Computing , Communication and Signal Processing, vol. 1. 2018.
  20. Y. Imrana, Y. Xiang, L. Ali, and Z. Abdul-Rauf, “A bidirectional LSTM deep learning approach for intrusion detection,” Expert Syst. Appl., vol. 185, no. June, p. 115524, 2021, doi: 10.1016/j.eswa.2021.115524.
  21. S. Welleck, I. Kulikov, S. Roller, E. Dinan, K. Cho, and J. Weston, “Neural Text Generation with Unlikelihood Training,” no. i, pp. 1–17, 2019, [Online]. Available: http://arxiv.org/abs/1908.04319.
  22. C. Meister, T. Vieira, and R. Cotterell, “Best-first beam search,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 795–809, 2020, doi: 10.1162/tacl_a_00346.
  23. A. Deshpande, J. Aneja, L. Wang, A. G. Schwing, and D. Forsyth, “Fast, diverse and accurate image captioning guided by part-of-speech,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 10687–10696, 2019, doi: 10.1109/CVPR.2019.01095.
  24. J. M. Czum, “Dive Into Deep Learning,” J. Am. Coll. Radiol., vol. 17, no. 5, pp. 637–638, 2020, doi: 10.1016/j.jacr.2020.02.005.
  25. D. H. Fudholi, A. Zahra, and R. A. N. Nayoan, “A Study on Visual Understanding Image Captioning using Different Word Embeddings and CNN-Based Feature Extractions,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 1, pp. 91–98, 2022, doi: 10.22219/kinetik.v7i1.1394.
  26. M. R. S. Mahadi, A. Arifianto, and K. N. Ramadhani, “Adaptive Attention Generation for Indonesian Image Captioning,” 2020 8th Int. Conf. Inf. Commun. Technol. ICoICT 2020, 2020, doi: 10.1109/ICoICT49345.2020.9166244.
  27. M. Kuyu, A. Erdem, and E. Erdem, “Altsözcük Ö ˘ geleri ile Türkçe Görüntü Altyazılama Image Captioning in Turkish with Subword Units,” 2018 26th Signal Process. Commun. Appl. Conf., pp. 1–4.
Read More

References


J. Zhang, K. Li, Z. Wang, X. Zhao, and Z. Wang, “Visual enhanced gLSTM for image captioning,” Expert Syst. Appl., vol. 184, no. June, p. 115462, 2021, doi: 10.1016/j.eswa.2021.115462.

S. Herdade, A. Kappeler, K. Boakye, and J. Soares, “Image captioning: Transforming objects into words,” Adv. Neural Inf. Process. Syst., vol. 32, no. NeurIPS, pp. 1–11, 2019.

Z. Deng, Z. Jiang, R. Lan, W. Huang, and X. Luo, “Image captioning using DenseNet network and adaptive attention,” Signal Process. Image Commun., vol. 85, p. 115836, 2020, doi: 10.1016/j.image.2020.115836.

O. Sargar and S. Kinger, “Image captioning methods and metrics,” 2021 Int. Conf. Emerg. Smart Comput. Informatics, ESCI 2021, pp. 522–526, 2021, doi: 10.1109/ESCI50559.2021.9396839.

I. Hrga and M. Ivašic-Kos, “Deep image captioning: An overview,” 2019 42nd Int. Conv. Inf. Commun. Technol. Electron. Microelectron. MIPRO 2019 - Proc., pp. 995–1000, 2019, doi: 10.23919/MIPRO.2019.8756821.

A. Nursikuwagus, R. Munir, and M. L. Khodra, “Image Captioning menurut Scientific Revolution Kuhn dan Popper,” J. Manaj. Inform., vol. 10, no. 2, pp. 110–121, 2020, doi: 10.34010/jamika.v10i2.2630.

Y. Pan, T. Yao, Y. Li, and T. Mei, “X-Linear Attention Networks for Image Captioning,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 10968–10977, 2020, doi: 10.1109/CVPR42600.2020.01098.

M. A. Al-Malla, M. A. Al-Malla, A. Jafar, and N. Ghneim, “Pre-trained CNNs as Feature-Extraction Modules for Image Captioning,” ELCVIA Electron. Lett. Comput. Vis. Image Anal., vol. 21, no. 1, pp. 1–16, 2022, doi: 10.5565/rev/elcvia.1436.

K. Chandhar, C. H. Sandeep, M. Akarapu, K. R. Chythanya, and V. Thirupathi, “Deep learning model for automatic image captioning,” Int. Conf. Res. Sci. Eng. Technol., vol. 2418, no. May, p. 020074, 2022, doi: 10.1063/5.0081847.

A. Kumar, “Image Captioning and Image Retrieval,” vol. 4, no. 4, pp. 909–912, 2019.

H. Hejazi and K. Shaalan, “Deep Learning for Arabic Image Captioning: A Comparative Study of Main Factors and Preprocessing Recommendations,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 11, pp. 37–44, 2021, doi: 10.14569/IJACSA.2021.0121105.

E. Mulyanto, E. I. Setiawan, E. M. Yuniarno, and M. H. Purnomo, “Automatic Indonesian Image Caption Generation using CNN-LSTM Model and FEEH-ID Dataset,” 2019 IEEE Int. Conf. Comput. Intell. Virtual Environ. Meas. Syst. Appl. CIVEMSA 2019 - Proc., 2019, doi: 10.1109/CIVEMSA45640.2019.9071632.

A. A. Nugraha, A. Arifianto, and Suyanto, “Generating image description on Indonesian language using convolutional neural network and gated recurrent unit,” 2019 7th Int. Conf. Inf. Commun. Technol. ICoICT 2019, pp. 1–6, 2019, doi: 10.1109/ICoICT.2019.8835370.

C. Wang, H. Yang, and C. Meinel, “Image Captioning with Deep Bidirectional LSTMs and Multi-Task Learning,” ACM Trans. Multimed. Comput. Commun. Appl., vol. 14, no. 2s, 2018, doi: 10.1145/3115432.

M. Chohan, A. Khan, M. S. Mahar, S. Hassan, A. Ghafoor, and M. Khan, “Image captioning using deep learning: A systematic literature review,” Int. J. Adv. Comput. Sci. Appl., vol. 11, no. 5, pp. 278–286, 2020, doi: 10.14569/IJACSA.2020.0110537.

Y. Azhar, M. C. Mustaqim, and A. E. Minarno, “Ensemble convolutional neural network for robust batik classification,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1077, no. 1, p. 012053, 2021, doi: 10.1088/1757-899x/1077/1/012053.

S. S. Rawat, K. S. Rawat, and R. Nijhawan, “A novel convolutional neural network-gated recurrent unit approach for image captioning,” Proc. 3rd Int. Conf. Smart Syst. Inven. Technol. ICSSIT 2020, no. Icssit, pp. 704–708, 2020, doi: 10.1109/ICSSIT48917.2020.9214109.

S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural Network for Classifying Images,” Int. J. Sci. Res. Publ., vol. 9, no. 10, p. p9420, 2019, doi: 10.29322/ijsrp.9.10.2019.p9420.

B. I. S. L. Nalbalwar, Advances in Intelligent Systems and Computing 810 Computing , Communication and Signal Processing, vol. 1. 2018.

Y. Imrana, Y. Xiang, L. Ali, and Z. Abdul-Rauf, “A bidirectional LSTM deep learning approach for intrusion detection,” Expert Syst. Appl., vol. 185, no. June, p. 115524, 2021, doi: 10.1016/j.eswa.2021.115524.

S. Welleck, I. Kulikov, S. Roller, E. Dinan, K. Cho, and J. Weston, “Neural Text Generation with Unlikelihood Training,” no. i, pp. 1–17, 2019, [Online]. Available: http://arxiv.org/abs/1908.04319.

C. Meister, T. Vieira, and R. Cotterell, “Best-first beam search,” Trans. Assoc. Comput. Linguist., vol. 8, pp. 795–809, 2020, doi: 10.1162/tacl_a_00346.

A. Deshpande, J. Aneja, L. Wang, A. G. Schwing, and D. Forsyth, “Fast, diverse and accurate image captioning guided by part-of-speech,” Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., vol. 2019-June, pp. 10687–10696, 2019, doi: 10.1109/CVPR.2019.01095.

J. M. Czum, “Dive Into Deep Learning,” J. Am. Coll. Radiol., vol. 17, no. 5, pp. 637–638, 2020, doi: 10.1016/j.jacr.2020.02.005.

D. H. Fudholi, A. Zahra, and R. A. N. Nayoan, “A Study on Visual Understanding Image Captioning using Different Word Embeddings and CNN-Based Feature Extractions,” Kinet. Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 1, pp. 91–98, 2022, doi: 10.22219/kinetik.v7i1.1394.

M. R. S. Mahadi, A. Arifianto, and K. N. Ramadhani, “Adaptive Attention Generation for Indonesian Image Captioning,” 2020 8th Int. Conf. Inf. Commun. Technol. ICoICT 2020, 2020, doi: 10.1109/ICoICT49345.2020.9166244.

M. Kuyu, A. Erdem, and E. Erdem, “Altsözcük Ö ˘ geleri ile Türkçe Görüntü Altyazılama Image Captioning in Turkish with Subword Units,” 2018 26th Signal Process. Commun. Appl. Conf., pp. 1–4.

Author Biography

Yufis Azhar, Universitas Muhammadiyah Malang

Google Scholar Profil:

https://scholar.google.com/citations?user=B7GpEhIAAAAJ&hl=en

SINTA Profil:

http://sinta2.ristekdikti.go.id/authors/detail?id=160049&view=overview

Download this PDF file
PDF
Statistic
Read Counter : 37 Download : 28

Downloads

Download data is not yet available.

Quick Link

  • Author Guidelines
  • Download Manuscript Template
  • Peer Review Process
  • Editorial Board
  • Reviewer Acknowledgement
  • Aim and Scope
  • Publication Ethics
  • Licensing Term
  • Copyright Notice
  • Open Access Policy
  • Important Dates
  • Author Fees
  • Indexing and Abstracting
  • Archiving Policy
  • Scopus Citation Analysis
  • Statistic
  • Article Withdrawal

Meet Our Editorial Team

Ir. Amrul Faruq, M.Eng., Ph.D
Editor in Chief
Universitas Muhammadiyah Malang
Google Scholar Scopus
Agus Eko Minarno
Editorial Board
Universitas Muhammadiyah Malang
Google Scholar  Scopus
Hanung Adi Nugroho
Editorial Board
Universitas Gadjah Mada
Google Scholar Scopus
Roman Voliansky
Editorial Board
Dniprovsky State Technical University, Ukraine
Google Scholar Scopus
Read More
 

KINETIK: Game Technology, Information System, Computer Network, Computing, Electronics, and Control
eISSN : 2503-2267
pISSN : 2503-2259


Address

Program Studi Elektro dan Informatika

Fakultas Teknik, Universitas Muhammadiyah Malang

Jl. Raya Tlogomas 246 Malang

Phone 0341-464318 EXT 247

Contact Info

Principal Contact

Amrul Faruq
Phone: +62 812-9398-6539
Email: faruq@umm.ac.id

Support Contact

Fauzi Dwi Setiawan Sumadi
Phone: +62 815-1145-6946
Email: fauzisumadi@umm.ac.id

© 2020 KINETIK, All rights reserved. This is an open-access article distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License