Next: Speed-space tradeoffs, Up: Compiling with optimization [Contents][Index]
The first form of optimization used by GCC occurs at the source-code level, and does not require any knowledge of the machine instructions. There are many source-level optimization techniques—this section describes two common types: common subexpression elimination and function inlining.
One method of source-level optimization which is easy to understand involves computing an expression in the source code with fewer instructions, by reusing already-computed results. For example, the following assignment:
x = cos(v)*(1+sin(u/2)) + sin(w)*(1-sin(u/2))
can be rewritten with a temporary variable t
to eliminate an
unnecessary extra evaluation of the term sin(u/2)
:
t = sin(u/2) x = cos(v)*(1+t) + sin(w)*(1-t)
This rewriting is called common subexpression elimination (CSE), and is performed automatically when optimization is turned on.19 Common subexpression elimination is powerful, because it simultaneously increases the speed and reduces the size of the code.
Another type of source-level optimization, called function inlining, increases the efficiency of frequently-called functions.
Whenever a function is used, a certain amount of extra time is required for the CPU to carry out the call: it must store the function arguments in the appropriate registers and memory locations, jump to the start of the function (bringing the appropriate virtual memory pages into physical memory or the CPU cache if necessary), begin executing the code, and then return to the original point of execution when the function call is complete. This additional work is referred to as function-call overhead. Function inlining eliminates this overhead by replacing calls to a function by the code of the function itself (known as placing the code in-line).
In most cases, function-call overhead is a negligible fraction of the total run-time of a program. It can become significant only when there are functions which contain relatively few instructions, and these functions account for a substantial fraction of the run-time—in this case the overhead then becomes a large proportion of the total run-time.
Inlining is always favorable if there is only one point of invocation of a function. It is also unconditionally better if the invocation of a function requires more instructions (memory) than moving the body of the function in-line. This is a common situation for simple accessor functions in C++, which can benefit greatly from inlining. Moreover, inlining may facilitate further optimizations, such as common subexpression elimination, by merging several separate functions into a single large function.
The following function sq(x)
is a typical example of a function
that would benefit from being inlined. It computes x^2, the
square of its argument x:
double sq (double x) { return x * x; }
This function is small, so the overhead of calling it is comparable to the time taken to execute the single multiplication carried out by the function itself. If this function is used inside a loop, such as the one below, then the function-call overhead would become substantial:
for (i = 0; i < 1000000; i++) { sum += sq (i + 0.5); }
Optimization with inlining replaces the inner loop of the program with the body of the function, giving the following code:
for (i = 0; i < 1000000; i++) { double t = (i + 0.5); /* temporary variable */ sum += t * t; }
Eliminating the function call and performing the multiplication in-line allows the loop to run with maximum efficiency.
GCC selects functions for inlining using a number of heuristics, such as
the function being suitably small. As an optimization, inlining is
carried out only within each object file. The inline
keyword can
be used to request explicitly that a specific function should be inlined
wherever possible, including its use in other files.20 The GCC Reference
Manual “Using GCC” provides full details of the inline
keyword, and its use with the static
and extern
qualifiers
to control the linkage of explicitly inlined functions (see Further reading).
Temporary values introduced by the compiler during common subexpression elimination are only used internally, and do not affect real variables. The name of the temporary variable ‘t’ shown above is only used as an illustration.
In this case, the definition of the inline function must be made available to the other files (e.g. in a header file).
Next: Speed-space tradeoffs, Up: Compiling with optimization [Contents][Index]