On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

Bai C.-Y; Raffel C; Kan W.C.-W.; HSUAN-TIEN LIN; Bai C.-Y;Lin H.-T;Raffel C;Kan W.C.-W.

doi:10.1145/3447548.3467198

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

Journal

Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Pages

2534-2542

Date Issued

2021

Author(s)

Bai C.-Y

Raffel C

Kan W.C.-W.

HSUAN-TIEN LIN

DOI

10.1145/3447548.3467198

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85114944301&doi=10.1145%2f3447548.3467198&partnerID=40&md5=461fd1b327c5bba5bf916ca199d2dbf0

https://scholars.lib.ntu.edu.tw/handle/123456789/607410

Abstract

Many recent developments on generative models for natural images have relied on heuristically-motivated metrics that can be easily gamed by memorizing a small sample from the true distribution or training a model directly to improve the metric. In this work, we critically evaluate the gameability of these metrics by designing and deploying a generative modeling competition. Our competition received over 11000 submitted models. The competitiveness between participants allowed us to investigate both intentional and unintentional memorization in generative modeling. To detect intentional memorization, we propose the "Memorization-Informed Frechet Inception Distance"(MiFID) as a new memorization-aware metric and design benchmark procedures to ensure that winning submissions made genuine improvements in perceptual quality. Furthermore, we manually inspect the code for the 1000 top-performing models to understand and label different forms of memorization. Our analysis reveals that unintentional memorization is a serious and common issue in popular generative models. The generated images and our memorization labels of those models as well as code to compute MiFID are released to facilitate future studies on benchmarking generative models. ? 2021 Owner/Author.

Subjects

benchmark

competition

computer vision

datasets

generative models

memorization

neural networks

Benchmarking

Image enhancement

Frechet

Generative model

Natural images

Perceptual quality

Small samples

Training sample

Data mining

Type

conference paper

On Training Sample Memorization: Lessons from Benchmarking Generative Modeling with a Large-scale Competition

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)