Abstract: The current training strategies based on knowledge distillation for image captioning assume that each learning model possesses complete learning value, lacking review and guidance mechanisms ...