RewardFlow: Generate Images by Optimizing What You Reward

Onkar Kishor Susladkar, Dong-Hwan Jang, Tushar Prakash, Adheesh Sunil Juvekar, Vedant Shah, Ayush Barik, Nabeel Bashir, Muntasir Wahed, Ritish Shrirao, Ismini Lourentzou
PLAN Lab University of Illinois Urbana-Champaign Sony Research India

CVPR 2026

Abstract

We introduce RewardFlow, an inversion-free framework that steers pretrained diffusion and flow-matching models at inference time through multi-reward Langevin dynamics. RewardFlow fuses complementary differentiable rewards for semantic alignment, perceptual fidelity, localized grounding, object consistency, and human preference into a single guidance signal.

To provide fine-grained instruction-level supervision, RewardFlow adds a differentiable VQA-based reward and a SAM2 text-guided object reward, enabling localized edits and preventing semantic leakage outside the target mask. A prompt-aware adaptive policy extracts semantic primitives from the instruction, infers intent, and dynamically modulates reward weights and step sizes throughout sampling.

We further tether sampling to the originating latent via a clean-latent KL regularizer, which anchors the drift produced by the fused reward gradient. Across editing and compositional generation benchmarks, RewardFlow obtains state-of-the-art zero-shot fidelity and alignment without any fine-tuning.

RewardFlow abstract diagram
Loading...

✅ Contributions

  • RewardFlow. We introduce a multi-reward-guided Langevin framework that combines semantic, perceptual, regional, object-level, and human-preference signals to enable controllable, inversion-free editing and generation.

  • Prompt-aware adaptive policy. Our lightweight policy parses semantic primitives from the instruction, infers edit intent, and dynamically adjusts reward weights and step sizes to balance coarse-to-fine optimization.

  • Fine-grained rewards. We design a differentiable VQA reward for attribute-level correctness and a SAM-guided localization reward that penalizes edits leaking outside the region of interest.

  • Theoretical grounding. We prove the update corresponds to a valid discretization of a Langevin SDE targeting a prompt-tilted density, giving a sound foundation for stable convergence during reward-guided sampling.

Quantitative Results

RewardFlow results
Loading...

RewardFlow achieves high edit fidelity and compositional alignment across editing benchmarks.

Qualitative Results

RewardFlow keeps subject identity and layout while enabling localized style/text edits, delivering clean, compositional results without inversion.
RewardFlow generation showcase
Loading...

Generation samples guided by multi-reward Langevin dynamics.

More RewardFlow editing
Loading...

Additional localized edits.

Image generation comparison
Loading...

Compositional generation vs. baselines.

Comparison results
Loading...

Comparison against prior reward-guided techniques highlights improved localization and reduced semantic drift.

BibTeX

@inproceedings{rewardflow2026,
  title={RewardFlow: Generate Images by Optimizing What You Reward},
  author={Susladkar, Onkar Kishor and Jang, Dong-Hwan and Prakash, Tushar and Juvekar, Adheesh Sunil and Shah, Vedant and Barik, Ayush and Bashir, Nabeel and Wahed, Muntasir and Shrirao, Ritish and Lourentzou, Ismini},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026},
  url={https://plan-lab.github.io/Language_Guided_editing_CVPR_26.pdf}
}