The Arm CPU architecture is implemented from a variety of microarchitectures to provide software compatibility for a variety of power, performance, and area combinations.

The CPU architecture defines the basic instruction set, as well as the exception handling and memory models that the operating system and virtual machine manager rely on.

CPU microarchitecture defines the design of the processor and covers the following to determine how the implementation satisfies the architecture contract: power, performance, area, pipe length, and cache level.

The central processing unit (CPU) is mainly composed of three parts: operator, controller and register, and the CPU has four major functions: processing instructions, performing operations, controlling time, and processing data.

Today’s processors (CPUs) have a total of three strongest architectures, one is the X86 architecture (CISC) represented by Intel and AMD, the other is the ARM architecture (RISC) used by mobile phones and tablet processors, and the last MIPS architecture (RISC) selected by China’s Loongson processors.

(Image source network)

Architecture refers to a set of functional specifications. ARM architecture refers to the functional specification of processors based on ARM architecture, that is, ARM CPU architecture.

Microarchitectures include: bus, power management, cache, ARM architecture

The AMR architecture, also known as the ARM CPU architecture, contains: instruction set, register set, exception model, memory model, debugging, tracing, and analysis.


The ARM architecture version refers to the various instruction sets corresponding to ARM. From 1985 to ARMv1 to 2022 ARMV9, the ARM architecture defines 9 versions of ARMv1 and ARMv9.

They are: ARMv1, ARMv2, ARMv3, ARMv4, ARMv5, ARMv6, ARMv7, ARMv8, ARMv9.

(Image source network)

ARMv1 and ARMv2 two generations did not do CPU, did not commercialize;

The CPU corresponding to ARMv3 is ARM6;

ARMv4 adds the Thumb instruction set for the first time;

ARMv5 improves Thumb, adding E (Enhanced DSP Instructions) and J (Java accelerator Jazelle) for the first time;

ARMv6 adds SIMD for the first time, upgrades to Thunmb-2, and adds TrustZone for the first time;

ARMv7 adds M (Long Multiplication Instruction), NEON (DSP+SIMD) for the first time;

ARMv8 adds instruction set A64 for the first time, which can execute 64-bit instructions; Switch between 32-bit and 64-bit;

ARMV9 Advanced SIMD with Extensible Vector Extended Instruction Set 2 (SVE2), AArch32 and AArch64, Confidential Realm Management Extension.

The new Armv9 architecture will form the next wave of 315 billion Arm architecture chips. The latest version of the A-Series architecture, the Armv9-A, offers the highest performance ever before, as well as greater security.

Its main features include:

Advanced SIHD with Extensible Vector Extended Instruction Set 2 (SVE2); AArch32 vs. AArch64; Expansion of management in the confidential area.


The word length of an ARM processor (CPU) is 32 bits, so the length of an assembly instruction is also 32 bits, that is, four bytes, and an address unit in memory is one byte, that is, an instruction occupies 4 address units.

In ARM’s CPU, the execution of a general instruction is simply divided into 3 parts: take the finger – > decode – > execution.

In the terminology of ARM’s underlying architecture design, the CPU first performs fetch (take instructions), then decode (decode), and then excute, which is based on F D E’s three-step operation to complete the CPU operation, which is called three-step flow.

Now the latest ARM architecture has been extended to 5 levels of pipeline: recall-> decoding->-execution->-access-> save results.

In the ARM architecture, the more streamline, the finer the process, the stronger the processing power, the more controllable the situation, the more complex, in general, the more streaming, it means that the architecture is more powerful.

Decompose the instruction process:

1, instruction pre-reading (decide where to take instructions from memory) – perfetch

2, instruction reading (reading instructions from the memory system) – fetch

3. Instruction decoding (interpreting instructions and generating control signals)

4. Register reading (provide the value of the register to the operating unit)

5. Allocation (assign instructions to the execution unit, that is, assign to ALU)

6. Execution (actual ALU unit processing)

7. Memory access (data access)

8. Register write-back (update the running result to the register)


The ARM32 has a total of 37 32-bit registers, including 31 general-purpose registers and 6 status registers.

where ungrouped registers R0-R7, packet registers R8-R14;

R0-R7 is called the low register group and R8-R15 is called the high register group;

R0-R12 is a general-purpose register for storing general-purpose data;

R13 is commonly used to store stack pointers, and users can also use other registers to store stack pointers, but under the Thumb instruction set, some instructions force the use of R13 to store stack pointers.

R14 is called the link register (full LR Link Register), and R14 is available when the subprogram is executed

The backup of the R15 (PC) register, after executing the subroutine, the value of R14 is assigned back to the PC register, that is, the return address is saved using R14.

R15 is called a program counter (PC), and in the ARM state, bits [1:0] are 0 and bits [31:2] are used to save the PC; In the Thumb state, bits [0] are 0 and bits [31:1] are used to save the PC.

R0-R3: Generally used for passing function parameters and return values;

R4-R6, R8, R10-R11: These registers have no special regulations, they are ordinary general-purpose registers;

R7: Frame Pointer, which points to the address of the stack frame and link register on the stack that was saved on the stack;

R9: Operating system reserved;

R12: Called IP-procedure scratch;

R13: Known as SP (stack pointer), it is the top pointer of the stack, the stack address of the store, and the target address ID of the program jump when the program jumps;

R14: Known as LR (link register), the link register, which holds the return address of the function;

R15: Called the program counter, points to the current instruction address.

str register: means that the contents of the register are stored on the stack;

(Image source network)

Under the ARM64 architecture, the CPU provides 33 registers, of which the first 31 (0 to 30) are general-purpose integer registers and the last 2 (31, 32) are dedicated registers (sp registers and pc registers).

x0-x7: Used to pass the parameters and return values of the subroutine, which do not need to be saved when used, extra parameters are passed with the stack, and the 64-bit return result is saved in x0;

x8: It is used to save the return address of the subroutine, which does not need to be saved when used;

x9-x15: It is a temporary register, also called a variable register, which does not need to be saved when the subroutine is used;

x16-x17: Subroutine internal call register (IPx), do not need to save when using, try not to use;

x18: It is the platform register, its use is platform-dependent, try not to use it;

x19-x28: They are temporary registers that must be saved when the subroutine is used;

x29: It is a frame pointer register (FP) used to connect stack frames, which must be saved when used;

x30: It is the Link Register (LR), which holds the return address of the subroutine;

x31: It is the stack pointer register (SP) that points to the top of the stack of each function;

Each of the 31 general-purpose registers of the ARM64 architecture can be used as a 64-bit X-register (X0-X30) or as a 32-bit W register (W0-W30).

For data processing instructions, choose X or W to determine the size of the operation. Using the X register will be calculated in 64 bits, and using the W register will be calculated in 32 bits.

For example, perform 32-bit integer addition:

ADD W0, W1, W2

For example, perform 64-bit integer addition:

ADD X0, X1, X2

ARM64: The A64 instruction set was introduced in Armv8-A to support 64-bit architectures. The A64 instruction set has a fixed 32-bit instruction length.

ARM32: The A32 instruction set has a fixed 32-bit instruction length and is aligned on a 4-byte boundary. The A32 instruction set is what we often call the ARM instruction set in the Armv6 and Armv7 architectures, and Armv8 and later renamed A32 to distinguish it from A64.

Thumb32: The T32 instruction set was originally introduced as a supplementary set of 16-bit instructions to improve the code density of user code. Over time, T32 evolved into a mixed-length instruction set of 16-bit and 32-bit. The 32 instruction set is the Thumb instruction set that is well known in the Armv6 and Armv7 architectures, and Armv8 and later renamed Thumb32.


There are three main ways to authorize ARM: architecture-level authorization, kernel-level authorization, and usage-level authorization.

Among them, the instruction set level has the highest authorization level, and enterprises can modify the ARM instruction set to achieve self-design processors.

The Arm architecture has 2 types of exceptions: IRQ (External Interrupt Exception) and FIQ (Fast Interrupt Exception), which are designed to be used to generate peripheral interrupts, and have independent routing control in both IRQ and FIQ, which are typically used to implement secure and non-secure interrupts.

When an exception occurs in ARM, the current program flow is interrupted. The processing element (PE) updates the current state and branches to a location in the vector table. Usually this location will contain generic code that pushes the state of the current program onto the stack and then branches out to further code.

There are two main instructions generated by exceptions: SWI and BKPT.

SWI: Soft interrupt instruction, generates a soft interrupt, and the processor enters management mode;

SWI 0 // produces a soft interrupt with an immediate interrupt of 0 

BKPT: breakpoint interrupt instruction, the processor generates software interrupts;

ARM’s anomaly model is mainly broken down as:

1. Reset abnormality (Reset): When the processor is working, the restart button is suddenly pressed, which will trigger the abnormality;

2. Data Abort: When the data fails to read, the data exception will be triggered;

3. Fast interrupt exception (FIQ): fast interrupt is faster than ordinary interrupt response speed;

4. External interrupt abnormality (IRQ): ordinary interrupt;

5, prefetch exception (Prefetch Abort): prefetch instruction failure, ARM in the process of executing the instruction, to prefetch the instruction to prepare for execution, if the prefetch instruction fails, it will produce the exception;

6. Soft interrupt exception (SWI): the software needs to interrupt the work of the processor, and the soft interrupt can be used to execute;

7. Undefined Instruction: The processor cannot recognize the exception of the instruction.

The operating mode of the ARM processor

Different programs require different hardware resources, so the ARM processor can provide 7 different combinations of hardware resources for different programs, each of which is called an ARM operating mode.

1. USR (user mode): ARM processor normal program execution mode;

2. FIQ (fast interrupt mode): execution mode for high-speed data transmission or channel processing;

3. IRQ (interrupt mode): the execution mode for general interrupt processing;

4, SVC (management mode): it is the protection mode used by the operating system;

5. ABT (Termination Mode): The mode that is entered when the data or instructions are pre-taken incorrectly;

6. SYS (system mode): run privileged operating system tasks;

From a programming point of view, the working state of ARM microprocessors generally has two kinds of ARM and Thumb, and supports switching between the two states.

1, ARM status: At this time, the processor executes 32-bit word alignment ARM instructions, and most of them work in this state.

2. Thumb status: At this time, the processor executes the 16-bit half-word alignment Thumb instruction.

ARM processor storage format

The ARM32 architecture treats memory as a linear combination of bytes starting at address 0, and it supports a maximum addressing space of 4GB.

ARM64 architecture processor uses 48-bit physical addressing, it can support up to 256T of address space, but the virtual address is still 64, the virtual address is much larger than the physical address.

Therefore, in the processor architecture design, the virtual address space is divided into 3 parts: user space, non-standard area, and kernel space, of which the core space and user space each part support maximum access of 256T.

User space: (0x0000_0000_0000_0000 – 0x0000_FFFF_FFFF_FFFF) 256T

Kernel space: (0xFFFF_0000_0000_0000 – 0xFFFF_FFFF_FFFF_FFFF) 256T

The rest is called non-canonical areas.

The kernel space can be subdivided into the following sections:

1. Vmalloc area: 0xFFFF_0000_0000_0000 – 0xFFFF_7BFF_BFFF_0000 (126974G)

2. Vmemmap area: 0xFFFF_7BFF_C000_0000 – 0xFFFF_7FFF_C000_0000 (4096G)

3. PCI I/O area: 0xFFFF_7FFF_AE00_0000 – 0xFFFF_7FFF_BE00_0000 (16M)

4. Moudules area: 0xFFFF_7FFF_C000_0000 – 0xFFFF_8000_0000_0000 (64M)

5. Normal memory linear mapping area: 0xFFFF_8000_0000_0000 – 0xFFFF_FFFF_FFFF_FFFF (128T)

The ARM architecture can store word data in two ways, the big-endian mode and the small-endian mode.

Big-endian mode (high-low-high-low): The high bytes of a word are stored in a low address byte unit, and the low bytes of a word are stored in a high address byte unit.


Come to an end

 【Recommended Reading】

You need to know about app security

APP privacy compliance

APP application security detection