Q-NOTE QN-7000HX Informations techniques Page 15

  • Télécharger
  • Ajouter à mon manuel
  • Imprimer
  • Page
    / 28
  • Table des matières
  • MARQUE LIVRES
  • Noté. / 5. Basé sur avis des utilisateurs
Vue de la page 14
Software Performance Optimization Methods
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 15
Using the techniques above, you can modify the C source code to the following style to help the
compiler do automatic vectorization.
float dot_product(float * restrict pa, float * restrict pb, unsigned int
len)
{
float sum=0.0;
unsigned int i;
for( i = 0; i < ( len & ~3); i++ )
sum += pa[i] *pb[i];
return sum;
}
GCC also supports the alternative forms __restrict__ and __restrict when not
compiling for C99. You can specify the standard used in coding the compiler with the option
-std=C99. Possible standards are c90, gnu99, and others.
Some publications state that manually unrolling the loop, as shown in the example below,
makes automatic vectorization by the compiler easier. However, recent GCC compilers are
better at recognizing and automatically vectorizing the above codes than the manually unrolled
codes. In practice, compilers might not vectorize the manually unrolled loop.
float dot_product(float * restrict pa, float * restrict pb, unsigned int
len )
{
float sum[4]={0.0,0.0,0.0,0.0};
unsigned int i;
for(i = 0; i < ( len & ~3); i+=4)
{
sum[0] += pa[i] *pb[i];
sum[1] += pa[i+1] *pb[i+1];
sum[2] += pa[i+2] *pb[i+2];
sum[3] += pa[i+3] *pb[i+3];
}
return sum[0]+sum[1]+sum[2]+sum[3];
}
Use Suitable Data Types
When optimizing algorithms operating on 16-bit or 8-bit data without SIMD, treating the data as
32-bit variables can sometimes yield better performance. This is because the compiler must
generate additional instructions to ensure the result does not overflow by a half-word or byte.
However, when targeting automatic vectorization with NEON, using the smallest data type that
can hold the required values is always the best choice. In a given time period, the NEON engine
can process twice as many 8-bit values as 16-bit values. Also, some NEON instructions do not
support some data types, and some only support certain operations. For example, NEON does
not support double-precision floating-point data types, so using a double-precision where a
single-precision float is adequate can prevent the compiler from vectorizing code. NEON
supports 64-bit integers only for certain operations, so avoid use of long variables where
possible.
NEON includes a group of instructions that can perform structured load and store operations.
These instructions can only be used for vectorized access to data structures where all
members are of the same size. Accessing 2/3/4-channel interleaved data with these
instructions can also accelerate NEON memory access performance.
Deviation of NEON Computation Results
For integers, the order of computation does not matter. For example, summing an array of
integers forward or backward always produces the same result. However, this is not true for
floating-point numbers because of the coding precision. Thus, the NEON-optimized code might
Vue de la page 14
1 2 ... 10 11 12 13 14 15 16 17 18 19 20 ... 27 28

Commentaires sur ces manuels

Pas de commentaire