Causal Forcing significantly outperforms Self Forcing in both visual quality and motion dynamics, while keeping the same training budget and inference efficiency —enabling real-time, streaming video generation on a single RTX 4090.
We identify a theoretical flaw in Self Forcing’s training pipeline during ODE initialization: a bidirectional teacher should not be used to supervise an autoregressive student, as this violates frame-level injectivity. Motivated by this analysis, we propose Causal Forcing: we first fine-tune a bidirectional base model into an autoregressive diffusion model, then use it as the teacher for ODE initialization, followed by the same DMD stage as in Self Forcing. Our method significantly outperforms Self Forcing in both visual quality and motion dynamics, while keeping the training budget and inference efficiency unchanged.
TBD