Industry 5.0 calls for manufacturing systems that sense human behavior with the same resolution used to monitor machines so that governance can adapt in real time without undermining operator autonomy. This paper introduces an AI-enabled multimodal ensemble Transformer that fuses shop-floor video, tool-embedded signals, PLC tags and HMI events through modality-specific encoders and a cross-modal Transformer. The framework produces continuous estimates of operator compliance and supervisory demand within a two-second sliding window, with inference executed on an edge GPU. These real-time estimates parameterize an evolutionary cooperation–competition (ECC) game that formalizes the strategic co-adaptation of operators and engineers. Pay-off functions are expressed in terms of measurable costs, benefits and penalties; replicator dynamics are then used to study how compliance and oversight evolve under alternative incentive structures. A connector-assembly case study demonstrates how live behavioral estimates can be fed back into supervisory policy and how the combined perception–game loop helps managers balance quality, throughput and human workload. The proposed approach provides a deployable blueprint for embedding AI-powered behavioral analytics into sociotechnical manufacturing systems, advancing the human-centric and adaptive ambitions of Industry 5.0.