Common Fault Locating Methods

Locating the Fault based on Exception Information

When the system is suspended unexpectedly, information about key registers is displayed on the serial port, as shown in the following figure. The information can be used to locate the function where the exception occurs and the related call stack.

Exception information

The exception information in the preceding figure is described as follows:

①: indicates that the exception occurs in the kernel space.

②: indicates the exception type. The value of far is the address accessed by the CPU when the exception occurs.

③: The pc value indicates the location of the instruction being executed when the exception occurs. The klr value indicates the next instruction to be executed. (Note: You do not need to pay attention to the value of klr if traceback 0 lr in ④ has a value.)

④: The lr values indicate the locations of the instructions to be executed by the program counter in sequence in normal cases.

You also need to check the OHOS_Image.asm file (assembly file corresponding to the burnt system image OHOS_Image.bin) in the out directory to determine the locations of the instructions corresponding to pc and lr. Based on the locations of the instructions, determine the functions using the instructions and the functions where lrs are located in sequence. In this way, you can obtain the function call relationships when the exception occurs.

Checking Memory Pool Integrity

You may not directly locate the fault only with the exception information. Sometimes, the fault cannot be located due to incorrect register values. If you suspect that the fault is caused by heap memory overwriting, you can call LOS_MemIntegrityCheck to check the memory pool integrity. The LOS_MemIntegrityCheck function traverses all nodes in the dynamic memory pool of the system. If all nodes are normal, the function returns 0 and no information is printed. Otherwise, error information is printed. This function uses (VOID *)OS_SYS_MEM_ADDR as the input parameter.

Generally, LOS_MemIntegrityCheck is called before and after the suspected service logic code to locate the heap memory overwriting. If the service code is correct, LOS_MemIntegrityCheck can be called successfully. By doing this, you can narrow down and locate the fault.

Locating Memory Overwriting for a Global Variable

If the memory of a global variable is illegally accessed, locate the address of the global variable in the OHOS_Image.map file and pay special attention to the variables recently used before the address. There is a high probability that memory overwriting occurs when the preceding variables (especially variables of the array type or variables that are forcibly converted to other types) are used.