Encouraging the return of unused or end-of-life cellphones remains a major problem in improving cellphones reverse logistics systems. This study introduces a reinforcement learning-based framework designed to guide strategic decision-making that can positively influence consumer behavior. The approach focuses on three key areas: protecting user data, providing timely and attractive incentives, and maintaining effective communication with consumers. These strategic elements are used to shape a learning environment in which the model adapts its actions to improve return outcomes. The problem is structured as a Markov Decision Process (MDP), where the system evolves based on variables such as consumer trust, product condition, and user satisfaction. Within this environment, a Q-learning agent is trained to choose from five possible interventions aimed at encouraging phone returns. Each action is aligned with one or more of the strategic themes, enabling a clear analysis of how individual policies affect behavior. Simulation results demonstrate that strategies focused on ensuring privacy and offering efficient incentives lead to significantly improved return rates and agent rewards. By combining behavioral factors with data-driven learning, the framework provides a robust tool for policymakers and companies aiming to improve sustainability in electronics recovery. Ultimately, this research supports circular economy goals by demonstrating how intelligent decision systems can align strategic interventions with consumer behavior to enhance reverse logistics performance in the cellphone sector.