Currently, artificial intelligence applications based on edge intelligent computing devices are becoming increasingly complex and high-precision. To reduce the latency and power consumption of edge devices, integrated storage and computing technology is applied at the edge device end, maximizing the reduction of latency and power consumption on edge devices by reducing the cost of data handling. However, traditional storage and computation integrated macros only support integer data calculations, making it difficult to support edge intelligent computing tasks that are increasingly high in accuracy, complexity, and on-chip training. It is difficult to achieve optimization in energy efficiency, area efficiency and accuracy with only a single analog or digital scheme. How to effectively combine the advantages of analog and digital storage modes, achieve higher energy efficiency and area efficiency overall, and ensure high accuracy as much as possible, as well as explore the design space of mixed digital and analog solutions, are still urgent problems that need to be solved in the macro field of integrated storage and computing.
Academician Liu Ming from the Institute of Microelectronics of the Chinese Academy of Sciences and others have developed a digital analog hybrid memory and computing integrated macro chip based on external product operation, designed a digital analog hybrid floating point SRAM internal memory computing scheme, proposed a hybrid method of analog and digital memory and computing macros, and combined the advantages of using analog memory for efficient array internal bit multiplication and using digital memory for efficient array external multi bit shift accumulation to achieve overall high energy efficiency and area efficiency. The research uses a residual type analog-to-digital converter architecture to achieve high throughput and low overhead by reducing the required resolution of the analog-to-digital converter to only the logarithm of the input bit accuracy. Through a floating-point/fixed-point memory block architecture based on mathematical principles of matrix outer product calculation, matrix matrix vector calculation can be completed through an accumulator element. Compared with previous digital storage schemes that use the matrix inner product principle of large fan in and multi-level adder trees, it can reduce the transmission delay of operations and have a higher overall computational throughput. This architecture supports fine-grained unstructured activation sparsity to further improve overall energy efficiency. This integrated memory and computing macro chip is fabricated in a 28nm CMOS process and can support BF16 floating-point precision operations and INT8 fixed-point precision operations. The peak energy efficiency of BF16 floating-point matrix matrix vector calculation reaches 72.12TFLOP/W, and the peak energy efficiency of INT8 fixed-point matrix matrix vector calculation reaches 111.17TFLOP/W. The above achievements provide a new approach for the integration of storage and computing architecture chips using a mixed digital and analog solution.
Recently, the research findings on A 28nm 72.12TFLOPS/W Hybrid Domain Outer Product Based Floating Point SRAM Computing in Memory Macro with Logarithmm Bit Width Residual ADC were presented at the 2024 International Solid State Circuit Conference (ISSCC 2024). This study was conducted in collaboration between the Institute of Microelectronics and Beijing Institute of Technology. The research work has been supported by the national key research and development plan, the National Natural Science Foundation of China and the Chinese Academy of Sciences strategic leading science and technology project.