Matrix multiplication is a complex mathematical operation. Naive implementation of the common algorithm may cost a lot of resources and time. An efficient matrix multiplication implementation is needed. Project Goal - Implementation of an efficient algorithm and Infrastructure Minimum FPGA resources. Minimum run time. Maximum throughput

The main goal was to implement an efficient algorithm and Infrastructure on HW. The implementation should provide a low latency and high throughput, while using the board resources as efficient as possible. Secondary goal was to try to minimize FPGA resources. While working, we examined the trade-offs between high performances and low FPGA utilization.

Obviously, the challenge was to multiply big matrices. We decided that 128×128 matrices with 8 bits cells are big enough. And of course, the result matrix dimension is 128×128 with 23 bits cells.

Also, after the implementation we were required to build verification environment in order to make some sanity checks.

Lastly, to test the performance we added HW logic, which helped us to measure the exact run time.