Vector Accelerator for RISC-V architecture

This project explores a simple design and implementation of a Vector Processing Unit attached to a RISC-V Multi-Cycle microarchitecture core.
We implemented the design on an FPGA, executed code, measured and compared performance and power on the integer-processor versus our vector-processor.
The comparative evaluation showed that in the cost of quadruple the hardware, we got significant differences in favor of vector-processor, both in energy and execution time.

This project investigates a simple design and implementation of a Vector Processing Unit (also called VPU) attached to a RISC-V core. We used the RISC-V 32-bit Integer-Processor implemented as Multi-Cycle microarchitecture and added a VPU. We implemented the design on an FPGA, executed code, measured and compared performance and power on the integer-processor versus our vector-processor (Block diagrams can be seen in Figure 1). We performed the following steps: (a) adopted the integer simulative core that is being used in “Digital Systems and Computer Structure – 044252” course, (b) converted that integer-core into a synthesizable design, (c) implemented that core on an FPGA, (d) added our VPU to it, resulting in our version of RISC-V Vector Processor, (e) executed test programs as implemented-hardware on an FPGA, both integer and vector processors. The comparative evaluation showed that in the cost of quadruple the hardware, we got significant differences in favour of vector-processor, both in energy and execution time.

 

This project confirms that the RISC-V Multi-Cycle microarchitecture is a proper host for vector processing. We found the combination of the two was fruitful and turned the non-efficient Multi-Cycle microarchitecture into an efficient vector processing machine, our RISC-V Vector Processor. For the cost of hardware complexity, we gained substantial improvement in both execution time and energy consumption. We also developed a performance measurement system that enabled us to measure both the processing time and power consumption of our design. Our system can be adopted by every project that will make use of the NetFPGA-SUME development board.

 

The project was submitted for Yehoraz Kasher Award

 Project presentation in competition

                          Presentation file

                            Poster

                  Final Demo Video 

 

Oded Eini odedeini@gmail.com

Lionn Bruckstein lbbt.15@gmail.com

Instructor: Professor Ran Ginosar