Kernel MG is the benchmark of a three dimensional multi-grid method. I used segment directed data distribution and with overlapping methods up to 26% improvement was achieved. In the class A of kernel MG, the communication wait time is about 50%, and calculation time is about 40%, and receive time is about 10%. We can expect up to 50% improvement for this kernel, but the achieved improvement was only 26%. Then we should invest more efficient improvement for this kernel.
Table 11:
The execution time with performance improvement.
The case 1 is sample problem trial with 6 SUN
SparcStation-2,
the case 2 is also sample problem with 8
processors, the case 3 is class A problem with 64 processors.
The model 1 is original(non-tuned), model 2 is calculation order tuning,
model 3 is overlapping grid assignment to the neighbor processors,
model 4 communication/calculation overlapping,
model 5 segment directed task assignment,
respectively.