OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

Abstract

Watermarking diffusion-generated images is crucial for copyright protection and user tracking. However, current diffusion watermarking methods face significant limitations: zero-bit watermarking systems lack the capacity for large-scale user tracking, while multi-bit methods are highly sensitive to certain image transformations or generative attacks, resulting in a lack of comprehensive robustness. In this paper, we propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process. OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations, with tailored regularization terms to preserve image quality and ensure imperceptibility. To address the challenge of memory consumption growing linearly with the number of denoising steps during optimization, OptMark incorporates adjoint gradient methods, reducing memory usage from O(N) to O(1). Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.

Methods

The robust watermark is embedded into the diffusion latent space during the generation process through inference time optimization. In the Decoding phase, the watermark embedding is extracted using a pre-trained message decoder, and the secret message is retrieved by comparing the decoded watermark embedding against a predefined key carrier.

OptMark’s imprinting process consists of two sequential stages: first, a structure watermark is injected into the initial latent state of generation; then, a detail watermark is embedded at an intermediate timestep. These complementary watermarks work in concert to maximize overall robustness.

Robustness Performance Comparison on Various Attacks

Multi-bit Comparison
Zero-bit Comparison

Method	None		Geometric		Valuemetric		Editing		Regeneration		Average
	Bit Acc.	TPR	Bit Acc.	TPR	Bit Acc.	TPR	Bit Acc.	TPR	Bit Acc.	TPR	Bit Acc.	TPR
DwtDct	0.828	0.576	0.501	0.000	0.509	0.363	0.719	0.256	0.494	0.000	0.573	0.125
DwtDctSvd	1.000	1.000	0.468	0.000	0.701	0.405	0.837	0.671	0.605	0.022	0.679	0.340
RivaGAN	0.994	0.994	0.742	0.492	0.974	0.966	0.914	0.775	0.570	0.003	0.835	0.641
SSL Watermark	1.000	1.000	0.996	0.998	0.989	0.994	0.922	0.750	0.596	0.005	0.906	0.763
Stable Signature	0.995	0.998	0.810	0.496	0.824	0.724	0.253	0.498	0.605	0.011	0.757	0.509
Gaussian Shading	1.000	1.000	0.634	0.250	0.998	0.997	0.870	0.750	0.986	0.958	0.880	0.756
AquaLoRA	0.963	0.979	0.690	0.271	0.954	0.973	0.858	0.702	0.930	0.955	0.866	0.741
OptMark (ours)	1.000	1.000	0.998	1.000	0.998	1.000	0.990	0.979	0.923	0.872	0.983	0.972

Method	None	Geometric	Valuemetric	Editing	Regeneration	Average
Tree-Ring	1.000	0.773	0.970	0.765	0.953	0.874
RingID	1.000	0.750	0.999	0.717	0.814	0.841
WIND	1.000	0.985	0.976	0.748	1.000	0.930
OptMark (ours)	1.000	1.000	1.000	1.000	0.993	0.999

Bold indicates best performance, red text indicates poor performance

Image Quality Comparison

Quantitative Comparison of Watermarked Image Quality

Method	FID ↓	CLIP Score ↑
w/o watermark	124.309	0.3686
SSL Watermark	128.053	0.3555
Gaussian Shading	127.756	0.3646
OptMark (ours)	127.378	0.3630

Lower FID is better; higher CLIP Score is better. Bold indicates the best per column.

Qualitative Comparison of Watermarked Image Quality

BibTeX

@misc{xing2025optmarkrobustmultibitdiffusion,
      title={OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization}, 
      author={Jiazheng Xing and Hai Ci and Hongbin Xu and Hangjie Yuan and Yong Liu and Mike Zheng Shou},
      year={2025},
      eprint={2508.21727},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2508.21727}, 
    }