OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization

1Zhejiang University
2Show Lab, National University of Singapore
*Equal Contribution, Corresponding Author

Abstract

Watermarking diffusion-generated images is crucial for copyright protection and user tracking. However, current diffusion watermarking methods face significant limitations: zero-bit watermarking systems lack the capacity for large-scale user tracking, while multi-bit methods are highly sensitive to certain image transformations or generative attacks, resulting in a lack of comprehensive robustness. In this paper, we propose OptMark, an optimization-based approach that embeds a robust multi-bit watermark into the intermediate latents of the diffusion denoising process. OptMark strategically inserts a structural watermark early to resist generative attacks and a detail watermark late to withstand image transformations, with tailored regularization terms to preserve image quality and ensure imperceptibility. To address the challenge of memory consumption growing linearly with the number of denoising steps during optimization, OptMark incorporates adjoint gradient methods, reducing memory usage from O(N) to O(1). Experimental results demonstrate that OptMark achieves invisible multi-bit watermarking while ensuring robust resilience against valuemetric transformations, geometric transformations, editing, and regeneration attacks.

Methods

OptMark Method Overview The robust watermark is embedded into the diffusion latent space during the generation process through inference time optimization. In the Decoding phase, the watermark embedding is extracted using a pre-trained message decoder, and the secret message is retrieved by comparing the decoded watermark embedding against a predefined key carrier. OptMark Message Embedding OptMark’s imprinting process consists of two sequential stages: first, a structure watermark is injected into the initial latent state of generation; then, a detail watermark is embedded at an intermediate timestep. These complementary watermarks work in concert to maximize overall robustness.

Robustness Performance Comparison on Various Attacks

Method None Geometric Valuemetric Editing Regeneration Average
Bit Acc. TPR Bit Acc. TPR Bit Acc. TPR Bit Acc. TPR Bit Acc. TPR Bit Acc. TPR
DwtDct 0.828 0.576 0.501 0.000 0.509 0.363 0.719 0.256 0.494 0.000 0.573 0.125
DwtDctSvd 1.000 1.000 0.468 0.000 0.701 0.405 0.837 0.671 0.605 0.022 0.679 0.340
RivaGAN 0.994 0.994 0.742 0.492 0.974 0.966 0.914 0.775 0.570 0.003 0.835 0.641
SSL Watermark 1.000 1.000 0.996 0.998 0.989 0.994 0.922 0.750 0.596 0.005 0.906 0.763
Stable Signature 0.995 0.998 0.810 0.496 0.824 0.724 0.253 0.498 0.605 0.011 0.757 0.509
Gaussian Shading 1.000 1.000 0.634 0.250 0.998 0.997 0.870 0.750 0.986 0.958 0.880 0.756
AquaLoRA 0.963 0.979 0.690 0.271 0.954 0.973 0.858 0.702 0.930 0.955 0.866 0.741
OptMark (ours) 1.000 1.000 0.998 1.000 0.998 1.000 0.990 0.979 0.923 0.872 0.983 0.972

Bold indicates best performance, red text indicates poor performance

Image Quality Comparison

Quantitative Comparison of Watermarked Image Quality

Method FID ↓ CLIP Score ↑
w/o watermark 124.309 0.3686
SSL Watermark 128.053 0.3555
Gaussian Shading 127.756 0.3646
OptMark (ours) 127.378 0.3630

Lower FID is better; higher CLIP Score is better. Bold indicates the best per column.

Qualitative Comparison of Watermarked Image Quality

OptMark Results

BibTeX

@misc{xing2025optmarkrobustmultibitdiffusion,
      title={OptMark: Robust Multi-bit Diffusion Watermarking via Inference Time Optimization}, 
      author={Jiazheng Xing and Hai Ci and Hongbin Xu and Hangjie Yuan and Yong Liu and Mike Zheng Shou},
      year={2025},
      eprint={2508.21727},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2508.21727}, 
    }