FlashEmbedding: Storing embedding tables in SSD for large-scale recommender systems

Wan HSun XCui YCHIA-LIN YANGTEI-WEI KUOXue C.J.2023-06-092023-06-092020https://www.scopus.com/inward/record.uri?eid=2-s2.0-85118174515&doi=10.1145%2f3476886.3477511&partnerID=40&md5=04ac837135b7f22418e0b3e4b862ad08https://scholars.lib.ntu.edu.tw/handle/123456789/632316We present FlashEmbedding, a hardware/software co-design solution for storing embedding tables on SSDs for large-scale recommendation inference under memory capacity-limited systems. FlashEmbedding leverages an embedding semantic-aware SSD, an embedding-oriented software cache, and pipeline techniques to improve the overall performance. We evaluate the performance of FlashEmbedding with our FPGA-based prototype SSD on a real-world public dataset. FlashEmbedding achieves up to 17.44× lower latency in embedding lookups and 2.89× lower end-to-end latency than baseline solution in a memory capacity-limted system. © 2021 ACM.Embedding; Recommender systems; Solid-state drive (SSD)[SDGs]SDG11Hardware-software codesign; Recommender systems; Semantics; Design solutions; Embeddings; Hardware/software codesign; Large-scales; Memory capacity; Performance; Semantic-aware; Software caches; Software pipeline; Solid-state drive; EmbeddingsFlashEmbedding: Storing embedding tables in SSD for large-scale recommender systemsconference paper10.1145/3476886.34775112-s2.0-85118174515