Q-NOTE QN-7000HX Informations techniques télécharger pdf (Page 22)

Boost NEON Performance by Improving Memory Access Efficiency

XAPP1206 v1.1 June 12, 2014 www.xilinx.com 22

A graphical demonstration of VST3 is shown in Figure 5.

VLD2 loads two or four registers, de-interleaving even and odd elements. This could be used,

for example, to split left and right channel stereo audio data. Similarly, VLD3 could be used to

split RGB pixels or XYZ coordinates into separate channels. Correspondingly, VLD4 could be

used with ARGB or CMYK images.

Note:

These special NEON instructions cannot be expressed by pure C language. You must use NEON

intrinsics or assembler code to have the compiler generate machine instructions.

Using the Preload Engine to Improve the Cache Hit Rate

ARM Cortex-A9 processors support speculation and out-of-order execution, which can hide

latencies associated with memory accesses. However, accesses to the external memory

system are usually so slow that there is still some penalty. If you can pre-fetch instructions or

data into the cache before you need them, you can minimize CPU stall time and maximize CPU

performance.

From a hardware perspective, all preload instructions are handled by a dedicated unit in the

Cortex-A9 processor with dedicated resources. This avoids using resources in the integer core

or the load store unit.

From a software perspective, cache preloading means three instructions, PLD (data cache

preload), PLI (instruction cache preload) and PLDW (preload data with intent to write). The PLD

instruction might generate a cache line-fill on a data cache miss, while the processor continues

to execute other instructions. If used correctly, PLD can significantly improve performance by

hiding memory access latencies. There is also a PLI instruction that enables you to give the

processor hints that an instruction load from a particular address is likely to happen soon. This

can cause the processor to preload the instructions to its instruction cache.

Lab 2

1. Create a new project in Xilinx SDK.

2. Import the lab 2 source files.

3. Run the source files on hardware.

4. Observe the performance improvement you have obtained.

X-Ref Target - Figure 5

Figure 5: Demonstration of VST3 Operation

[

\

]

[

\

]

[

\

]

[ [ [ [

\ \ \ \

] ] ] ]

>5@

















967^'''`>5@

'

'

'



;

1 2 ... 17 18 19 20 21 22 23 24 25 26 27 28

Commentaires sur ces manuels

Pas de commentaire

Q-NOTE QN-7000HX Informations techniques Page 22

Commentaires sur ces manuels

Produits connexes et manuels pour Tablettes Q-NOTE QN-7000HX