8th North America Conference on Industrial Engineering and Operations Management

Convolutional Neural Network Architectures Analysis for Image Captioning

DONG HO SHIN
Publisher: IEOM Society International
0 Paper Citations
1 Views
1 Downloads
Track: High School STEM Poster Competition
Abstract

The Image Captioning models with the Attention method have developed significantly compared to previous models, but it is still unsatisfactory in recognizing images. The early Image Captioning models were built by combining CNN as an encoder and RNN as a decoder, making them susceptible to the influence of each CNN and RNN model. In particular, the CNN network has shown performance differences over time, which affects the RNN model used as a decoder. In this paper, we experiment with various CNN architectures to improve the performance of image captioning based on the CNN architecture as a reference. We analyze the performance of Image Captioning based on the performance of various CNN architecture models. We compared seven different CNN Architectures, according to Batch size, using public benchmarks: MS-COCO datasets. All CNN architectures used in this study are pre-trained networks on the ImageNet dataset. In our experimental results, DenseNet (Huang et al. 2017) and InceptionV3 (Szegedy et al. 2016) got the most satisfactory result among the seven CNN architectures after training 50 epochs on GPU.

 

Keywords

Deep Learning, Computer Vision, Image Captioning, CNN and DenseNet

Published in: 8th North America Conference on Industrial Engineering and Operations Management , Houston, United States of America

Publisher: IEOM Society International
Date of Conference: June 13-15, 2023

ISBN: 979-8-3507-0546-1
ISSN/E-ISSN: 2169-8767