This paper proposes a comprehensive digital twin design for monitoring and optimizing air cargo ground operations, i.e., the logistical activities that take place between the time a cargo aircraft lands and takes off again. Specifically, we decided to focus on gathering the real-time data required to build the essential real-to-virtual connection of the digital twin, that would ultimately feed an optimization algorithm. This is a real challenge given the highly constrained and regulated environment of airports, where traditional data collection techniques, such as information systems and Internet of Things, have proven insufficient for our data needs. We therefore developed a computer vision-based approach that leverages airports' existing network of cameras to identify and track ground service vehicles, allowing real-time monitoring of cargo ground operations and aircraft (un)loading progress. Technically, we collected real and synthetic visual data of air cargo ground operations and studied different labeling strategies to overcome the challenges of occlusion, resource shape variation, and detection stability. We then trained the pre-trained YOLO11n object detection model on the generated labeled datasets and used the BoT-SORT tracking algorithm. The results obtained are promising. In particular, we achieved an overall mAP50-95 of 0.883 (resp. 0.929) on real (resp. synthetic) data. This supports the idea that computer vision is a good approach to connect the real with the virtual in the context of a digital twin for air cargo ground operations monitoring and optimization. A future research direction would be to improve the method further by designing a tailored tracking algorithm.