FashionMirror: Co-attention Feature-remapping Virtual Try-on with Sequential Template Poses

Chen, Chieh Yun; Lo, Ling; Huang, Pin Jui; Shuai, Hong Han; WEN-HUANG CHENG

doi:10.1109/ICCV48922.2021.01355

FashionMirror: Co-attention Feature-remapping Virtual Try-on with Sequential Template Poses

Journal

Proceedings of the IEEE International Conference on Computer Vision

ISBN

9781665428125

Date Issued

2021-01-01

Author(s)

Chen, Chieh Yun

Lo, Ling

Huang, Pin Jui

Shuai, Hong Han

WEN-HUANG CHENG

DOI

10.1109/ICCV48922.2021.01355

URI

https://scholars.lib.ntu.edu.tw/handle/123456789/628549

URL

https://api.elsevier.com/content/abstract/scopus_id/85127827142

Abstract

Virtual try-on tasks have drawn increased attention. Prior arts focus on tackling this task via warping clothes and fusing the information at the pixel level with the help of semantic segmentation. However, conducting semantic segmentation is time-consuming and easily causes error accumulation over time. Besides, warping the information at the pixel level instead of the feature level limits the performance (e.g., unable to generate different views) and is unstable since it directly demonstrates the results even with a misalignment. In contrast, fusing information at the feature level can be further refined by the convolution to obtain the final results. Based on these assumptions, we propose a co-attention feature-remapping framework, namely FashionMirror, that generates the try-on results according to the driven-pose sequence in two stages. In the first stage, we consider the source human image and the target try-on clothes to predict the removed mask and the try-on clothing mask, which replaces the pre-processed semantic segmentation and reduces the inference time. In the second stage, we first remove the clothes on the source human via the removed mask and warp the clothing features conditioning on the try-on clothing mask to fit the next frame human. Meanwhile, we predict the optical flows from the consecutive 2D poses and warp the source human to the next frame at the feature level. Then, we enhance the clothing features and source human features in every frame to generate realistic try-on results with spatio-temporal smoothness. Both qualitative and quantitative results show that FashionMirror outperforms the state-of-the-art virtual try-on approaches.

Type

conference paper

FashionMirror: Co-attention Feature-remapping Virtual Try-on with Sequential Template Poses

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)