[bootlin/training-materials updates] master: debugging: add perf example for crc_random example (5f1e4c3a)

Tue Nov 15 18:17:08 CET 2022

Repository : https://github.com/bootlin/training-materials
On branch  : master
Link       : https://github.com/bootlin/training-materials/commit/5f1e4c3adc5c494b10498e59b8f2b053696a930c

>---------------------------------------------------------------

commit 5f1e4c3adc5c494b10498e59b8f2b053696a930c
Author: Clément Léger <clement.leger at bootlin.com>
Date:   Tue Nov 15 18:15:40 2022 +0100

    debugging: add perf example for crc_random example
    
    Signed-off-by: Clément Léger <clement.leger at bootlin.com>


>---------------------------------------------------------------

5f1e4c3adc5c494b10498e59b8f2b053696a930c
 .../nfsroot/root/system_profiling/crc_random.c     |   6 +-
 .../debugging-system-wide-profiling.tex            | 105 +++++++++++++++------
 2 files changed, 78 insertions(+), 33 deletions(-)

diff --git a/lab-data/debugging/nfsroot/root/system_profiling/crc_random.c b/lab-data/debugging/nfsroot/root/system_profiling/crc_random.c
index 29192857..006f5bcf 100644
--- a/lab-data/debugging/nfsroot/root/system_profiling/crc_random.c
+++ b/lab-data/debugging/nfsroot/root/system_profiling/crc_random.c
@@ -72,9 +72,6 @@ int main(int argc, char **argv)
 
 	while (1) {
 
-		clock_gettime(CLOCK_MONOTONIC, &start);
-		start_nano = timespec_to_nano(&start);
-
 		ret = read(urandom_fd, data, RANDOM_SIZE);
 		if (ret < 0) {
 			perror("Failed to read random data\n");
@@ -82,8 +79,9 @@ int main(int argc, char **argv)
 			return EXIT_FAILURE;
 		}
 
+		clock_gettime(CLOCK_MONOTONIC, &start);
+		start_nano = timespec_to_nano(&start);
 		crc32(data, RANDOM_SIZE);
-
 		clock_gettime(CLOCK_MONOTONIC, &end);
 		end_nano = timespec_to_nano(&end);
 
diff --git a/labs/debugging-system-wide-profiling/debugging-system-wide-profiling.tex b/labs/debugging-system-wide-profiling/debugging-system-wide-profiling.tex
index 636d2155..b8766941 100644
--- a/labs/debugging-system-wide-profiling/debugging-system-wide-profiling.tex
+++ b/labs/debugging-system-wide-profiling/debugging-system-wide-profiling.tex
@@ -2,17 +2,13 @@
 {System wide profiling and tracing}
 {Objectives:
   \begin{itemize}
-    \item System profiling with {\em perf}.
     \item IRQ latencies using {\em ftrace}.
     \item Tracing and visualizing system activity using {\em kernelshark} or
           {\em LTTng}
+    \item System profiling with {\em perf}.
   \end{itemize}
 }
 
-\section{System profiling with {\em perf}}
-
-
-
 \section{IRQ latencies using {\em ftrace}}
 
 {\em ftrace} features a specific tracers for irq latency which is named irqsoff.
@@ -37,50 +33,61 @@ latency ?
 
 \section{LTTng}
 
-Go into the primes directory and open the \code{primes.c} file. This file
-computes primes number and output them in a file provided as an argument.
-Compile it using the provided toolchain
+Go into the \code{crc_random} directory and open the \code{crc_random.c} file. This file
+computes crc32 on a buffer of random data. Compile it using the provided toolchain
 
 \begin{bashinput}
-  $(CROSS_COMPILE)-gcc -Wall -Werror primes.c -o primes
+$ cd /home/<user>/debugging-labs/nfsroot/root/system_profiling
+$  $(CROSS_COMPILE)-gcc -Wall -Werror crc_random.c -o crc_random
 \end{bashinput}
 
 In order to observe our program performances, we want to instrument it with
-tracepoints. We would like to know two things:
-
-\begin{itemize}
-  \item How much time it takes to generate a prime number.
-  \item How much times it takes to write data to the output file.
-\end{itemize}
+tracepoints. We would like to know how much times it takes to compute the
+crc32 of a specific buffer.
 
 In order to do so, add tracepoints to your program which will allows to measure
-this. We'll add 4 tracepoints:
+this. We'll add 2 tracepoints:
 
 \begin{itemize}
-  \item One for the start of prime number computation
-  \item Another for the end of computation
-  \item One for the start of writing to disk
-  \item Another for the end of writing to disk
+  \item One for the start of crc32 computation (\code{compute_crc_entry})
+  \item Another for the end of crc32 computation (\code{compute_crc_exit})
 \end{itemize}
 
 For that, create a tracepoint provider header file template named
-\code{primes-tp.h} and another one for the tracepoint provider named
-\code{primes-tp.c}. You can then use the new tracepoints in your program to
-trace specific points of execution.
+\code{crc_random-tp.h} and another one for the tracepoint provider named
+\code{crc_random-tp.c}. These tracepoints should belong to the \code{crc_random}
+provider namespace.
 
-Once added, you can compile your application using the following command.
+You can then use the new tracepoints in your program to trace specific points
+of execution. Once added, you can compile your application using the following
+command:
 
 \begin{bashinput}
-$ $(CROSS_COMPILE)-gcc -I. primes-tp.c primes.c  -llttng-ust -o primes
+$ $(CROSS_COMPILE)-gcc -I. crc_random-tp.c crc_random.c  -llttng-ust -o crc_random
 \end{bashinput}
 
-Enable the program tracepoints, run it and collect tracepoints.
+Finally, on the target, enable the program tracepoints, run it and collect
+tracepoints.
 
-Download \code{tracecompass} latest version from the eclipse website:
-(https://www.eclipse.org/tracecompass/) and extract it using:
+\begin{bashinput}
+$ lttng-sessiond --daemonize
+$ lttng create crc_session
+$ lttng enable-event --userspace crc_random:compute_crc_entry
+$ lttng enable-event --userspace crc_random:compute_crc_exit
+$ lttng start
+$ ./crc_random
+$ lttng destroy
+\end{bashinput}
+
+Using \code{babeltrace2}, analyze the traces and see the time that is spent
+between the tracepoints.
+
+In order to analyze our traces, we are going to use tracecompass. Download
+\code{tracecompass} latest version and extract it using:
 
 \begin{bashinput}
-  tar -xvf trace-compass-*.tar.gz
+$ wget https://ftp.fau.de/eclipse/tracecompass/releases/8.1.0/rcp/trace-compass-8.1.0-20220919-0815-linux.gtk.x86_64.tar.gz
+$ tar -xvf trace-compass-*.tar.gz
 \end{bashinput}
 
 Run it
@@ -88,3 +95,43 @@ Run it
 $ cd trace-compass*
 $ ./tracecompass
 \end{bashinput}
+
+\section{System profiling with {\em perf}}
+
+In order to profile the whole system, we are going to use perf and try to find
+the function that takes most of the time executing.
+
+First of all, we will run a global recording of functions and their backtrace on
+(all CPUs) during 10 seconds using the following command:
+
+\begin{bashinput}
+$ perf record -F 99 -g -- sleep 10
+\end{bashinput}
+
+This command will generate a \code{perf.data} file that can be used on a remote
+computer for further analysis. Copy that file and fix the permissions using
+\code{chown}:
+
+\begin{bashinput}
+$ sudo cp /home/<user>/debugging-labs/nfsroot/root/system_profiling/perf.data .
+$ sudo chown <user>:<user> perf.data
+\end{bashinput}
+
+We will then use perf report to visualize the aquired data:
+
+\begin{bashinput}
+$ perf report -k /home/<user>/debugging-labs/nfsroot/root/vmlinux
+\end{bashinput}
+
+
+Another useful tool for performance analysis is the Flamegraphs. latest perf
+version includes a builtin support for flamegraphs but the template is not
+available on debian so we will use another support provided by Brendan Gregg
+scripts.
+
+\begin{bashinput}
+$ git clone https://github.com/brendangregg/FlameGraph.git
+$ perf script | ./FlameGraph/stackcollapse-perf.pl | ./FlameGraph/flamegraph.pl > flame.html
+\end{bashinput}
+
+Using this flamegraph, analyze the system load.
\ No newline at end of file