NoSerC Norwegian Service Centre for Climate Modelling -> SGI optimisation -> Optimising a**b in f77 on SGI Origin3800

Optimising a**b in f77 on SGI Origin3800

By Arild Burud
Norwegian Service Centre for Climate Modelling
May 14., 2002

Introduction

This is an extension to the earlier report "Optimising a modified ccm3.2 climate model" by Arild Burud and Egil Støren

We have further investigated the use of math power functions and the effect of compiler options on this feature in fortran programs.
This brief report summarizes the methods and findings.

Description of tests

A small fortran program was written with heavy use of the pow() function, and in such a way that the equation could not be deleted by optimisation. The basic form of the routine is as follows:

      program powtest
      real x1,a
      x1 = 0.1
      a = 1.0
      do i1=1,50000000
         x1 = x1 + a**2.001 
         a = a + 0.000000013914
      enddo
      write (6,*) x1,a
      stop
      end

The program line "x1 = x1 + a**2.001" was then modified to test different calculations and the effect of compiler optimisation flags.
The purpose was to find the most effective way to program such equations.

Compiler translations

In investigating the generated machine code, several strategies are involved by the compiler. Each strategy is dependant of the optimisation level chosen.

a**2 is always translated to a multiplication statement a*a, regardless of optimisation level.
# 6 x1 = x1 + x2**2 madd.d $f28,$f28,$f24,$f24 # [2]
a**N.0 will be simplified to a**N only by high optimisation (-Ofast).

a**N will, for lower level optimisation (-O2), be translated to a math library call to powdi().

 #   6  	 x1 = x1 + x2**5 
	lw $25,%call16(__powdi)($gp)  	# [0]  __powdi
	mov.d $f12,$f22               	# [3]  
	jalr $25                      	# [3]  __powdi
	addiu $5,$0,5                 	# [3]

a**N.0 will, for lower level optimisation (-O2), be translated to a math library call to pow().

 #   6  	 x1 = x1 + x2**2.0 
	lw $25,%call16(pow)($gp)      	# [0]  pow
	mov.d $f12,$f20               	# [2]  
	jalr $25                      	# [3]  pow
	mov.d $f13,$f24               	# [3]

a**N will, for higher level optimisation (-Ofast), be translated to a collection of multiplications (a*a*a*a...)
Code generation for -Ofast will use the processor pipeline more efficient by unrolling and replicating loops, although the effect is not high for loops involving math library calls. The effect is marginally present in the results of these tests.
The -lfastm option allows using an optimised math library. Apparently this library is not very efficient for a**2.0, but this should have been replaced by the quicker a*a anyway. The other tests show higher speed.

Results of tests

A rough timing test was performed on these expressions, the results are shown below, as time to run each test:

expression	-Ofast	-O2 -lfastm	-O2
a**2.001	13.65	6.83	13.65
a**2.0	0.21	6.82	3.7
a**2	0.2	0.2	0.21
a**3.0	0.2	6.84	13.65
a**3	0.2	3.1	3.11
aaa	0.2	0.2	0.21
a**5.0	0.3	6.83	13.65
a**5	0.3	4.02	4.11
aaaaa	0.34	0.4	0.4
a**1.5	3.51	6.83	13.65
a*sqrt(a)	3.52	3.51	3.50

All times are in seconds "user time" as reported by time. Note that these tests were performed on gridur.ntnu.no.

Conclusions

Primary conclusion is that testing is important in order to achieve highest performance.

Brute force use of the compiler optimisation, with -Ofast, is the best solution, if the program allows it.

Always write power equations with the exponent as integer, when possible (a**N.0 = a**N).

If high optimisation is not possible, expand power equations into multiplications by hand (a**N = a*a*a*a...), or simplify otherwise (a**1.5 = a*sqrt(a)).

Use -lfastm (optimised math library) when the highest accuracy is not necessary.

Send comments to webmaster