Uncontrolled or poorly managed traffic flow leads to congestion, increased safety risks, and excessive emissions, making sustainable traffic management a key challenge. Variable Speed Limit (VSL) control has emerged as a promising solution to regulate traffic flow dynamically. However, designing an effective VSL strategy is complex due to the conflicting nature of multiple objectives mobility, safety, and environmental sustainability. In this study, we investigate the potential of Deep Reinforcement Learning (DRL) to optimize VSL decisions while balancing these competing goals. We define a gain function as the reward, incorporating different weight configurations to assess their impact on speed regulation, traffic efficiency, CO₂ emissions, and safety. The evaluation is carried out on a microscopic case study, representing a controlled highway in Morocco, modeled using the SUMO traffic simulation software, where we test and compare three different DRL algorithms to determine their effectiveness in learning sustainable VSL policies. To analyze the performance of the trained agents, we track various traffic indicators that reflect traffic flow efficiency, risk reduction and environmental impact. The results demonstrate the agent’s role in improving overall traffic conditions by striking a balance across key traffic aspects of mobility, safety and sustainability.