
RLHF-trained AI systems are systematically rewarded for producing outputs that score well on average human preference — which mathematically pushes them toward creative mediocrity. This paper defines the B+ Trap, identifies RLHF as the structural mechanism, and proposes a Rebel AI framework built on novelty search, adversarial divergent agents, and metacognitive feedback.