针对传统超长指令字(Very Long Instruction Word,VLIW)处理器代码体积增大会显著降低处理器性能的问题,设计了一种八流出新型变长指令跨边界派发窗。该派发窗兼容压缩指令派发功能,支持压缩指令和整字指令混合派发,有效减小了处理器代...针对传统超长指令字(Very Long Instruction Word,VLIW)处理器代码体积增大会显著降低处理器性能的问题,设计了一种八流出新型变长指令跨边界派发窗。该派发窗兼容压缩指令派发功能,支持压缩指令和整字指令混合派发,有效减小了处理器代码体积。同时该派发窗引入指令跨边界派发机制,进一步排出指令间无用气泡。通过搭建派发窗仿真模型,并基于DSP/VoLIB库进行仿真,结果显示,采用新型变长指令跨边界派发窗能够充分发挥指令级并行优势。经编译器调度优化后,库中典型程序体积比传统派发窗平均降低约19.26%,处理器性能提升约15.4%。展开更多
指令压缩技术能够克服传统超长指令字(very long instruction word,VLIW)结构的指令高速缓冲(cache)中长指令字密度低的缺陷,使长指令字中的各条指令能紧密地排列在高速缓冲行(cache line)中,但可能导致长指令字分置于两个cache line,...指令压缩技术能够克服传统超长指令字(very long instruction word,VLIW)结构的指令高速缓冲(cache)中长指令字密度低的缺陷,使长指令字中的各条指令能紧密地排列在高速缓冲行(cache line)中,但可能导致长指令字分置于两个cache line,使其不能同时参与取指与发射,从而成为处理器的性能瓶颈.受到分置cache line的影响,传统提升循环效率的软件流水方法性能下降.高性能变长指令发射窗的机制能够解决分离指令字带来的取指发射问题,为取指流水线提供高效连续的指令流,特别地,该机制缓存循环的一次迭代,硬件支持循环的软件流水,有效地增强VLIW结构的数字信号处理器(digital signal processor,DSP)的性能.通过搭建时钟精确的处理器仿真模型,并基于DSP?IMG库上进行仿真,结果显示,采用两级指令发射窗机制,平均性能提高约21.89%.展开更多
Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for mat...Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for matrix inversion. A global register file (RF) is used to connect al the clusters. Two nearby clusters share a local register file. The instruction sets are also designed for the VLIW processor. Experimental results show that the proposed VLIW architecture takes only 45 latency to invert a 4 × 4 matrix when running at 150 MHz. The proposed design is roughly five times faster than the DSP solution in processing speed.展开更多
文摘针对传统超长指令字(Very Long Instruction Word,VLIW)处理器代码体积增大会显著降低处理器性能的问题,设计了一种八流出新型变长指令跨边界派发窗。该派发窗兼容压缩指令派发功能,支持压缩指令和整字指令混合派发,有效减小了处理器代码体积。同时该派发窗引入指令跨边界派发机制,进一步排出指令间无用气泡。通过搭建派发窗仿真模型,并基于DSP/VoLIB库进行仿真,结果显示,采用新型变长指令跨边界派发窗能够充分发挥指令级并行优势。经编译器调度优化后,库中典型程序体积比传统派发窗平均降低约19.26%,处理器性能提升约15.4%。
文摘指令压缩技术能够克服传统超长指令字(very long instruction word,VLIW)结构的指令高速缓冲(cache)中长指令字密度低的缺陷,使长指令字中的各条指令能紧密地排列在高速缓冲行(cache line)中,但可能导致长指令字分置于两个cache line,使其不能同时参与取指与发射,从而成为处理器的性能瓶颈.受到分置cache line的影响,传统提升循环效率的软件流水方法性能下降.高性能变长指令发射窗的机制能够解决分离指令字带来的取指发射问题,为取指流水线提供高效连续的指令流,特别地,该机制缓存循环的一次迭代,硬件支持循环的软件流水,有效地增强VLIW结构的数字信号处理器(digital signal processor,DSP)的性能.通过搭建时钟精确的处理器仿真模型,并基于DSP?IMG库上进行仿真,结果显示,采用两级指令发射窗机制,平均性能提高约21.89%.
基金supported by the National Natural Science Foundation of China(6110015561227004+4 种基金613720716137213161201289)the Fundamental Research Funds of the Central Universities of China(K5051302096JB140207)
文摘Matrix inversion is a critical part in communication, signal processing and electromagnetic system. A flexible and scalable very long instruction word (VLIW) processor with clustered architecture is proposed for matrix inversion. A global register file (RF) is used to connect al the clusters. Two nearby clusters share a local register file. The instruction sets are also designed for the VLIW processor. Experimental results show that the proposed VLIW architecture takes only 45 latency to invert a 4 × 4 matrix when running at 150 MHz. The proposed design is roughly five times faster than the DSP solution in processing speed.