16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips

Tang K.-T

doi:10.1109/ISSCC42613.2021.9365984

16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips

Journal

Digest of Technical Papers - IEEE International Solid-State Circuits Conference

Journal Volume

64

Pages

250-252

Date Issued

2021

Author(s)

Su J.-W

Chou Y.-C

Liu R

Liu T.-W

Lu P.-J

Wu P.-C

Chung Y.-L

Hung L.-Y

Ren J.-S

Pan T

Li S.-H

Chang S.-C

Sheu S.-S

Lo W.-C

CHIH-I WU

Si X

Lo C.-C

Liu R.-S

Hsieh C.-C

Tang K.-T

Chang M.-F.

DOI

10.1109/ISSCC42613.2021.9365984

URI

https://www.scopus.com/inward/record.uri?eid=2-s2.0-85102386607&doi=10.1109%2fISSCC42613.2021.9365984&partnerID=40&md5=9988e8055afb09741e59688f1049514d

https://scholars.lib.ntu.edu.tw/handle/123456789/632155

Abstract

Recent SRAM-based computation-in-memory (CIM) macros enable mid-to-high precision multiply-and-accumulate (MAC) operations with improved energy efficiency using ultra-small/small capacity (0.4-8KB) memory devices. However, advanced CIM-based edge-AI chips favor multiple mid/large capacity SRAM-CIM macros: with high input (IN) and weight (W) precision to reduce the frequency of data reloads from external DRAM, and to avoid the need for additional SRAM buffers or ultra-large on-chip weight buffers. However, enlarging memory capacity and throughput increases the delay parasitics on WLs and BLs, and the number of parallel computing elements; resulting in longer compute latency (tAC), lower energy-efficiency (EF), degraded signal margin, and larger fluctuations in power consumption across data-patterns (see Fig. 16.3.1). Recent SRAM-CIM macros tend to not use in-lab SRAM cells, with a logic-based layout, in favor of foundry provided compact-layout 8T [2], 3, [5] or 6T cells with local-computing cells (LCCs) [4], [6] to reduce the cell-array area and facilitate manufacturing. This paper presents a SRAM-CIM structure using (1) a segmented-BL charge-sharing (SBCS) scheme for MAC operations, with low energy consumption and a consistently high signal margin across MAC values (MACV); (2) An new LCC cell, called a source-injection local-multiplication cell (SILMC), to support the SBCS scheme with a consistent signal margin against transistor process variation; and (3) A prioritized-hybrid-ADC (Ph-ADC) to achieve a small area and power overhead for analog readout. A 28nm 384kb SRAM-CIM macro was fabricated using a foundry compact-6T cell with support for MAC operations with 16 accumulations of 8b-inputs and 8b-weights with near-full precision output (20b). This macro achieves a 7.2ns tAC and a 22.75TOPS/W EF for 8b-MAC operations with an FoM (IN-precision × W-precision × output-ratio × output-channel × EF/tAC) 6× higher than prior work. © 2021 IEEE.

Other Subjects

Analog to digital conversion; Computation theory; Dynamic random access storage; Energy efficiency; Energy utilization; Foundries; Green computing; Integrated circuit layout; T-cells; High-precision; Low energy consumption; Memory capacity; Multiply and accumulate operations; Output channels; Parallel com- puting; Power overhead; Transistor process; Static random access storage

Type

conference paper

16.3 A 28nm 384kb 6T-SRAM Computation-in-Memory Macro with 8b Precision for AI Edge Chips

關於 (About)

聯絡資訊 (Contact Us)

相關網站 (Useful Links)

關於開放取用 (Open Access, OA)

出版社期刊論文授權政策 (Copyright)

使用說明 (Instructions)

登入說明 (Sign-in)

匯入著作 (Submission)