8-byte alignment for FP32 vectors, resulting in poor performance with some access patterns (e.g., stencil-like). • VE30 relaxes the restriction to 4-byte alignment Performance Evaluation of a Next-Generation SX-Aurora TSUBASA Vector Supercomputer 27 0 10 20 30 40 50 60 70 VE20 w/o packed VE30 w/o packed VE30 w/ packed GFLOP/s do k = 1, nz do j = 1, ny do i = 1, nx a(i,j,k) = a(i,j,k) + & (b(i-1,j-1,k-1) + b(i ,j-1,k-1) + b(i+1,j-1,k-1) + & b(i-1,j ,k-1) + b(i ,j ,k-1) + b(i+1,j ,k-1) + & b(i-1,j+1,k-1) + b(i ,j+1,k-1) + b(i+1,j+1,k-1) + & b(i-1,j-1,k ) + b(i ,j-1,k ) + b(i+1,j-1,k ) + & b(i-1,j ,k ) + b(i ,j ,k ) + b(i+1,j ,k ) + & b(i-1,j+1,k ) + b(i ,j+1,k ) + b(i+1,j+1,k ) + & b(i-1,j-1,k+1) + b(i ,j-1,k+1) + b(i+1,j-1,k+1) + & b(i-1,j ,k+1) + b(i ,j ,k+1) + b(i+1,j ,k+1) + & b(i-1,j+1,k+1) + b(i ,j+1,k+1) + b(i+1,j+1,k+1))/27.0 end do end do end do 27-point stencil microbenchmark