eolas/neuron/5ce25cd2-1b2b-459a-bb58-283daadf8753/Monitoring_processes_and_resources.md

199 lines
7 KiB
Markdown
Raw Normal View History

2024-12-09 18:34:15 +00:00
---
tags:
- systems-programming
- Linux
- procedural
---
# Monitoring processes and resources
## General purpose diagnostic programs (memory, CPU, I/O)
### `top`/`htop`
We can use [ps](Processes.md) to list the
currently running processes but it does not provide much information about the
resource metrics or how the process changes over time. We can use `top` to get
more information.
`top` provides an interactive interface for the information that `ps` displays.
It updates in real time and shows the most active processes based on the CPU
time that they are utilising. You can also order by memory usage.
_Here I have pressed `u` to show only the processes associated with my user:_
![](static/htop.png)
### Main commands
| Command | Action |
| ------- | ------------------------------- |
| -u | Show processes by selected user |
| M | Sort by memory usage |
| P | Sort by cumulative CPU usage |
| ? | View key and explanation |
### Understanding the categories
- `Main/IO`
- The first covers all processes. The second focuses on input/output processes
(i.e. reading and writing to disks and other devices)
- `PRI`
- This stands for _priority_. This metric reflects the kernel's current
schedule priority for the process. The higher the value, it is less likely
that the kernel will schedule the process if there are competing processes
that require CPU time. The lower the value, the greater priority this
process has over others.
- `NI`
- This stands for _nice value_. This metric exists in order to allow
administrators to nudge or influence the priority of a given process. You
cannot directly tell the kernel to _do x now instead of y_ but you can make
what are effectively suggestions by manipulating the nice value.
- The kernel adds the nice value to the current priority value for the given
process to determine its next time slot. When you increase the nice value of
process _P_ you are being "nicer" to the other processes by influencing the
priority of _P_ downwards so that the other processes receive greater
precedence from the kernel.
- By default, the nice value will be 0. To reduce priority of PID 1234, you
would use:
```bash
$ renice 20 1234
```
- `VIRT`
- The total amount of
[virtual memory](Virtual_memory_and_the_MMU_in_Linux.md) used by
the process including: program code, data, shared libraries, pages that have
been swapped, pages that have been mapped but not used.
- `RES`
- Stands for _resident size_
- The non swapped _physical_ memory the process has used
- `SHR`
- The size of the process's
[shared pages](Virtual_memory_and_the_MMU_in_Linux.md#shared-pages)
- `S`
- Status:
- S for sleeping (idle)
- R for running
- D for disk sleep
### `vmstat`
`vmstat` provides similar metrics to `htop` but tells you more about the memory
state and the activities of the kernel in a single row.
The default output is a single line with the averages since boot. You can add a
delay parameter (in secs) which will then output at that interval, allowing you
to see memory usage in realtime, e.g:
```
$ vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
2 0 0 4326768 334228 5050952 0 0 8 19 80 10 4 1 94 0 0
0 0 0 4365520 334260 5054468 0 0 0 125 2140 3434 4 1 94 0 0
1 0 0 4382400 334276 5068940 0 0 0 77 2102 3988 3 1 95 0 0
1 0 0 4434000 334288 5052908 0 0 0 25 2859 4278 6 1 92 0 0
0 0 0 4391576 334304 5086484 0 0 0 110 2899 6480 8 3 90 0 0
```
- `procs`
- The number of runnable processes (`r`) and the number of blocked (`b`)
processes
- `memory`
- The core memory output distinguishing:
- Total kbs swapped to disk
- Total kbs free
- Total kbs currently in
[buffer](Role_of_memory_in_computation.md#relation-between-cache-and-buffers)
and not written
- Total amount of virtual memory in the
[cache](Role_of_memory_in_computation.md#relation-between-cache-and-buffers)
- `swap`
- Distinguishes amount of memory
[swapped](Swap_space.md) in (`si`) to memory and
swapped out (`so`) to disk
- `io`
- Disk actions
- Amount of data read from harddisk (`bi`)
- Amount of data written to harddisk (`bo`)
- `system`
- The number of times the kernel switches to kernel code
- `cpu`
- Percentage of the different CPU behaviours:
- Responding to user tasks (`us`)
- Time that it is idle (`id`)
## Files being used by active processes: `lsof`
`lsof` stands for _list open files_. It lists opened files and the processes
using them. Without modifiers it outputs a huge amount of data. The best way to
use it is to execute it against a specific PID. For example the below output
gives me some useful info about which files VS Code is using:
![](static/lsof.png)
## System calls: `strace`
A system call is when a process requests a service from the
[kernel](The_kernel.md), for instance an I/O operation to
memory. We can trace these system calls with `strace`.
## CPU performance
We can use the `uptime` program to assess overall CPU performance in the form of
a load average.
> Load average is the number of active processes currently ready to run. It is
> an estimate of the number of processes that are capable of using the CPU at
> any given time.
`Uptime` gives you three load averages:
```bash
$ uptime
11:19:16 up 14 days, 3:53, 1 user, load average: 0.84, 0.57, 0.50
```
- The three numbers are load averages for the past 1 minute, 5 minutes and 15
minutes respectively.
- A load average close to 0 is usually a good sign because it means that your
processor isn't being challenged and you are conserving power. Anything equal
to or above 1 means that a single process is using the CPU nearly all the
time. You can identify that process with `htop` and it will obviously be near
to the top. (This is often caused by Chrome and Electron-based software.)
## Memory status
We know that processes primarily interact with virtual memory in the form of
pages which are then translated to physical blocks by the kernel via the
[MMU](Virtual_memory_and_the_MMU_in_Linux.md). There are several tools
which provide windows onto this process.
### System page size
We can view the overall system page size which is a representation of the amount
of virtual memory available:
```bash
$ getconf PAGE_SIZE
4096
```
This will typically be the same for all Linux systems.
### `free` : available physical memory
`free` displays the total amount of free and¬used physical and swap memory in
the system, as well as the
[buffers and caches](Role_of_memory_in_computation.md#relation-between-cache-and-buffers)
used by the kernel.
```bash
$ free
total used free shared buff/cache available
Mem: 16099420 5931512 5039344 2046460 5128564 7781904
Swap: 3145724 0 3145724
```