Q-NOTE QN-7000HX Informations techniques Page 22

  • Télécharger
  • Ajouter à mon manuel
  • Imprimer
  • Page
    / 28
  • Table des matières
  • MARQUE LIVRES
  • Noté. / 5. Basé sur avis des utilisateurs
Vue de la page 21
Boost NEON Performance by Improving Memory Access Efficiency
XAPP1206 v1.1 June 12, 2014 www.xilinx.com 22
A graphical demonstration of VST3 is shown in Figure 5.
VLD2 loads two or four registers, de-interleaving even and odd elements. This could be used,
for example, to split left and right channel stereo audio data. Similarly, VLD3 could be used to
split RGB pixels or XYZ coordinates into separate channels. Correspondingly, VLD4 could be
used with ARGB or CMYK images.
Note:
These special NEON instructions cannot be expressed by pure C language. You must use NEON
intrinsics or assembler code to have the compiler generate machine instructions.
Using the Preload Engine to Improve the Cache Hit Rate
ARM Cortex-A9 processors support speculation and out-of-order execution, which can hide
latencies associated with memory accesses. However, accesses to the external memory
system are usually so slow that there is still some penalty. If you can pre-fetch instructions or
data into the cache before you need them, you can minimize CPU stall time and maximize CPU
performance.
From a hardware perspective, all preload instructions are handled by a dedicated unit in the
Cortex-A9 processor with dedicated resources. This avoids using resources in the integer core
or the load store unit.
From a software perspective, cache preloading means three instructions, PLD (data cache
preload), PLI (instruction cache preload) and PLDW (preload data with intent to write). The PLD
instruction might generate a cache line-fill on a data cache miss, while the processor continues
to execute other instructions. If used correctly, PLD can significantly improve performance by
hiding memory access latencies. There is also a PLI instruction that enables you to give the
processor hints that an instruction load from a particular address is likely to happen soon. This
can cause the processor to preload the instructions to its instruction cache.
Lab 2
1. Create a new project in Xilinx SDK.
2. Import the lab 2 source files.
3. Run the source files on hardware.
4. Observe the performance improvement you have obtained.
X-Ref Target - Figure 5
Figure 5: Demonstration of VST3 Operation
[
\
]
[
\
]
[
\
]
[ [ [ [
\ \ \ \
] ] ] ]
>5@








967^'''`>5@
'
'
'
;
Vue de la page 21
1 2 ... 17 18 19 20 21 22 23 24 25 26 27 28

Commentaires sur ces manuels

Pas de commentaire