Please keep in mind that the process of using the strace utility to diagnose problems is a task that is best suited for a systems administrator with the skills, training, and expertise required to do so for you. Although this technique is not related to cPanel or the basic configuration of cPanel, we offer this guide as a courtesy for individuals that are interested in learning how to better diagnose problems on Linux. If this guide does not provide enough information for you to properly trace a process and diagnose your problem, you will need to reach out to a systems administrator for further assistance.
You would usually be interested in making use of this technique as a last resort due to the complexity and labor intensive nature of the process. You should always explore every human readable log and online resource before turning to this diagnostic tool unless you know exactly what you're doing.
The strace utility captures data on the system calls that a process makes. The trace output of the system calls of a process provides a wealth of information about what a process does and the errors encountered during the runtime of the process. This can help to learn about undocumented internals of a piece of software, provide more detailed information about errors that have been encountered, show what files a process opens, and more.
You should review the manual page for this software to learn more about its options, use, and other information:
This guide provides basic, general information about the use of strace. We also have a more targeted guide for tracing cPanel and WHM processes:
First, determine if the process that you need to trace will already be running at the time that you need to reproduce the error or problem, or if you can manually start the process on the command line at the same time that you start the trace. Use the first section below if you need to trace an already running process. Look for the last section in this article if you need to manually execute the software to reproduce the problem.
Tracing Processes that are Already Running
If your process of interest is a deamon or other software that will already be running when you need to trace it, you'll first need to find the process ID of the running software. Then you'll use that process ID within the strace command, reproduce the issue, and then review the resulting output.
1. First use the screen utility to start a screen session as the root user via SSH or Terminal. This will allow you the ability to run multiple SSH sessions at once so that you can quickly and easily manage the multiple tasks that need to be performed at the same time for the duration of this guide. If you are not familiar with screen, or you prefer not to use screen, you can also just login to the server multiple times via SSH in different windows to accomplish the same affect. Please see the screen man page if you need further guidance for using it. Man Pages - screen(1)
screen -S tracingProcesses
2. Issue the following command to view all of the running processes on the server. When you find your process of interest, make note of the process ID (aka PID).
ps auxf | less
3. Once you have the process ID write it down or make note of it for use in step 4. Then create a new screen window by pressing the following shortcut sequence. You'll new see a fresh terminal without any of the previous ps auxf information in it.
CTRL + a + c
4. Issue the following command to attach a trace to the process. Please keep in mind that this will generate a very large amount of textual data very quickly depending on how much activity the process that you attach to is doing. If you run the strace command for an extended period of time, you can generated gigabytes of text data and cause the server to run out of disk space if you are not monitoring the server's disk usage closely. Please note that you'll need to replace PID in the command below with the process ID that you collected from step 2 above.
strace -ttvvff -s 1000 -o nameYourTraceOutputFileAnythingYouLike.trace -p PID
5. You should see that the command you just ran attached to your process, and it will output a list of the process IDs of any child processes that the original process generates and it attaches to.
6. Now, do whatever is required to reproduce the problem or error that you are trying to diagnose.
7. If you must leave the strace running for a long period of time to wait for the problem to occur, you should create a new screen window and actively monitor the disk usage of the server to make sure that you are not running out of disk space. Use the following shortcut command to create a new screen window:
CTRL + a + c
To monitor the disk usage of your server issue the following command periodically and compare the output to see how quickly it is changing:
To page through all of the screen windows that you have open, use the following shortcut:
CTRL + a + n
8. Once you have reproduced the error, you should stop the strace process as quickly as possible to limit the amount of unrelated data captured. The output generated by strace is extremely verbose. When reviewing the output, the sheer volume of data to parse can be extremely difficult to process so limiting the data to only the period of time when the problem occurs is important. To stop the strace process, first use the following shotcut to page through your screen windows until you are back at the window where you issued the strace command:
CTRL + a + n
Once you are back at the strace window, issue the following shortcut to stop the trace:
CTRL + c
9. Now you see at least one (but usually many) files in the current directory that all end with one of the process IDs that the trace attached to. You could examine all of these files individually with the less command. Although it is usually more useful to interweave these files into one file where all of the system calls are ordered in the time that they were run. You can do this with the following command:
strace-log-merge nameYourTraceOutputFileAnythingYouLike.trace | less
Note that you'll use the same filename that you originally used for the -o option when you ran the trace. You should not include the PID numbers at the end of the files.
10. Examining the output can be extremely difficult. Most people will use various text processing tools such as grep, awk, cut, etc to filter out only the important or relevant data.
Sometimes, when you are not exactly sure what you are looking for, you may need to just review all of the raw data manually with the less command which can be extremely time consuming.
One technique is to start at the bottom of the trace and work your way up until you see where the problem or error occurred because ideally you will have stopped the trace very shortly after the problem occurred.
Most of the trace output that you review will be unintelligible data that you would not understand unless you intimately familiar with Linux system calls. If you don't know what most of the output means, your efforts are best spent looking for a human readable error message buried within the other data. If you need, you can also research the meaning of specific Linux system calls if they seem related. Although this can be extremely time consuming and will often not produce much help unless you already know exactly what you're looking for due to the sheer amount of information you would need to research.
Tracing a Process by executing it directly
1. Determine the exact command that is needed to reproduce the problem or error.
2. Issue the following command via SSH or Terminal as the root user. Please replace the trace output file name and "yourCommandHere" as required to match your specific situation.
strace -ttvvff -s 1000 -o nameYourTraceOutputFileAnythingYouLike.trace yourCommandHere
3. Do whatever is required to reproduce the error or problem.
4. If the process does not automatcially stop, you can use the following keyboard shortcut to stop it:
CTRL + c
5. Now you can use steps 9 and 10 from the process above to review the produced output.