Tensor Compiler

Tensor programs are widely used across different domains, including deep learning, scientific computing, etc. Optimizing tensor programs is essential as tensor programs usually operate on various components requiring massive parallelism to achieve high performance. However, such optimizations across different architectures require significant human efforts and expertise in algorithms and architectures. The Tensor or Deep Learning (DL) compilers facilitate the application developers in the above by taking the model definitions described in the DL frameworks like TensorFlow and PyTorch as inputs and generating efficient code implementations on various AI hardware.

We predominantly perform our research on the following:
1. Auto-tuning tensor program generation
Advancements in hardware such as GPU, TPU, DSPs, and DL frameworks offer optimized kernel support facilitating DL innovations. With advances in DNN architectures and backend hardware, the search space of compiler optimizations has grown manifold limiting data-driven approaches to auto-tune the tensor compilers and generate efficient tensor programs. We work on methodologies to efficiently tune tensor program generation making it portable and performant in terms of throughput and latency at lower power consumption on emerging AI hardware ranging from HPC to edge devices.

2. Correctness in Tensor Compilers
The fundamental differing nature of the DL systems (less deterministic and more statistical) compared to the traditional software system poses challenges in evaluating the correctness. Considering the domains, missing out on bugs can cause serious consequences.
A DL system testing comprises two components – testing DL libraries and testing tensor (DL) compilers. The tensor compiler testing primarily accounts for correctness, efficiency, robustness, fairness, and interpretability. Here, we aim to address this topic.