Abstract: Nowadays high-performance computing is gradually implementing Exa-scale computing, and the performance of single node has reached several T-flops. Communication problem has become one of the ...
Abstract: Matrix-matrix multiplication is commonly used as the core of operation in linear algebra computations and various applications such as finite element analysis and deep neural networks.