From e544221d97a57a9be7ba18a0df52a15b4cc76f97 Mon Sep 17 00:00:00 2001 From: Jeff Hostetler Date: Fri, 22 Feb 2019 14:25:00 -0800 Subject: trace2: Documentation/technical/api-trace2.txt Created design document for Trace2 feature. Signed-off-by: Jeff Hostetler Signed-off-by: Junio C Hamano diff --git a/Documentation/technical/api-trace2.txt b/Documentation/technical/api-trace2.txt new file mode 100644 index 0000000..2de565f --- /dev/null +++ b/Documentation/technical/api-trace2.txt @@ -0,0 +1,1349 @@ += Trace2 API + +The Trace2 API can be used to print debug, performance, and telemetry +information to stderr or a file. The Trace2 feature is inactive unless +explicitly enabled by enabling one or more Trace2 Targets. + +The Trace2 API is intended to replace the existing (Trace1) +printf-style tracing provided by the existing `GIT_TRACE` and +`GIT_TRACE_PERFORMANCE` facilities. During initial implementation, +Trace2 and Trace1 may operate in parallel. + +The Trace2 API defines a set of high-level messages with known fields, +such as (`start`: `argv`) and (`exit`: {`exit-code`, `elapsed-time`}). + +Trace2 instrumentation throughout the Git code base sends Trace2 +messages to the enabled Trace2 Targets. Targets transform these +messages content into purpose-specific formats and write events to +their data streams. In this manner, the Trace2 API can drive +many different types of analysis. + +Targets are defined using a VTable allowing easy extension to other +formats in the future. This might be used to define a binary format, +for example. + +== Trace2 Targets + +Trace2 defines the following set of Trace2 Targets. +Format details are given in a later section. + +`GIT_TR2` (NORMAL):: + + a simple printf format like GIT_TRACE. ++ +------------ +$ export GIT_TR2=~/log.normal +$ git version +git version 2.20.1.155.g426c96fcdb +------------ ++ +------------ +$ cat ~/log.normal +12:28:42.620009 common-main.c:38 version 2.20.1.155.g426c96fcdb +12:28:42.620989 common-main.c:39 start git version +12:28:42.621101 git.c:432 cmd_name version (version) +12:28:42.621215 git.c:662 exit elapsed:0.001227 code:0 +12:28:42.621250 trace2/tr2_tgt_normal.c:124 atexit elapsed:0.001265 code:0 +------------ + +`GIT_TR2_PERF` (PERF):: + + a column-based format to replace GIT_TRACE_PERFORMANCE suitable for + development and testing, possibly to complement tools like gprof. ++ +------------ +$ export GIT_TR2_PERF=~/log.perf +$ git version +git version 2.20.1.155.g426c96fcdb +------------ ++ +------------ +$ cat ~/log.perf +12:28:42.620675 common-main.c:38 | d0 | main | version | | | | | 2.20.1.155.g426c96fcdb +12:28:42.621001 common-main.c:39 | d0 | main | start | | | | | git version +12:28:42.621111 git.c:432 | d0 | main | cmd_name | | | | | version (version) +12:28:42.621225 git.c:662 | d0 | main | exit | | 0.001227 | | | code:0 +12:28:42.621259 trace2/tr2_tgt_perf.c:211 | d0 | main | atexit | | 0.001265 | | | code:0 +------------ + +`GIT_TR2_EVENT` (EVENT):: + + a JSON-based format of event data suitable for telemetry analysis. ++ +------------ +$ export GIT_TR2_EVENT=~/log.event +$ git version +git version 2.20.1.155.g426c96fcdb +------------ ++ +------------ +$ cat ~/log.event +{"event":"version","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.620713","file":"common-main.c","line":38,"evt":"1","exe":"2.20.1.155.g426c96fcdb"} +{"event":"start","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621027","file":"common-main.c","line":39,"argv":["git","version"]} +{"event":"cmd_name","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621122","file":"git.c","line":432,"name":"version","hierarchy":"version"} +{"event":"exit","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621236","file":"git.c","line":662,"t_abs":0.001227,"code":0} +{"event":"atexit","sid":"1547659722619736-11614","thread":"main","time":"2019-01-16 17:28:42.621268","file":"trace2/tr2_tgt_event.c","line":163,"t_abs":0.001265,"code":0} +------------ + +== Enabling a Target + +A Trace2 Target is enabled when the corresponding environment variable +(`GIT_TR2`, `GIT_TR2_PERF`, or `GIT_TR2_EVENT`) is set. The following +values are recognized. + +`0`:: +`false`:: + + Disables the target. + +`1`:: +`true`:: + + Enables the target and writes stream to `STDERR`. + +`[2-9]`:: + + Enables the target and writes to the already opened file descriptor. + +``:: + + Enables the target, opens and writes to the file in append mode. + +`af_unix:[:]`:: + + Enables the target, opens and writes to a Unix Domain Socket + (on platforms that support them). ++ +Socket type can be either `stream` or `dgram`. If the socket type is +omitted, Git will try both. + +== Trace2 API + +All public Trace2 functions and macros are defined in `trace2.h` and +`trace2.c`. All public symbols are prefixed with `trace2_`. + +There are no public Trace2 data structures. + +The Trace2 code also defines a set of private functions and data types +in the `trace2/` directory. These symbols are prefixed with `tr2_` +and should only be used by functions in `trace2.c`. + +== Conventions for Public Functions and Macros + +The functions defined by the Trace2 API are declared and documented +in `trace2.h`. It defines the API functions and wrapper macros for +Trace2. + +Some functions have a `_fl()` suffix to indicate that they take `file` +and `line-number` arguments. + +Some functions have a `_va_fl()` suffix to indicate that they also +take a `va_list` argument. + +Some functions have a `_printf_fl()` suffix to indicate that they also +take a varargs argument. + +There are CPP wrapper macros and ifdefs to hide most of these details. +See `trace2.h` for more details. The following discussion will only +describe the simplified forms. + +== Public API + +All Trace2 API functions send a messsage to all of the active +Trace2 Targets. This section describes the set of available +messages. + +It helps to divide these functions into groups for discussion +purposes. + +=== Basic Command Messages + +These are concerned with the lifetime of the overall git process. + +`void trace2_initialize()`:: + + Determines if any Trace2 Targets should be enabled and + initializes the Trace2 facility. This includes starting the + elapsed time clocks and thread local storage (TLS). ++ +This function emits a "version" message containing the version of git +and the Trace2 protocol. ++ +This function should be called from `main()` as early as possible in +the life of the process. + +`int trace2_is_enabled()`:: + + Returns 1 if Trace2 is enabled (at least one target is + active). + +`void trace2_cmd_start(int argc, const char **argv)`:: + + Emits a "start" message containing the process command line + arguments. + +`int trace2_cmd_exit(int exit_code)`:: + + Emits an "exit" message containing the process exit-code and + elapsed time. ++ +Returns the exit-code. + +`void trace2_cmd_error(const char *fmt, va_list ap)`:: + + Emits an "error" message containing a formatted error message. + +`void trace2_cmd_path(const char *pathname)`:: + + Emits a "cmd_path" message with the full pathname of the + current process. + +=== Command Detail Messages + +These are concerned with describing the specific Git command +after the command line, config, and environment are inspected. + +`void trace2_cmd_name(const char *name)`:: + + Emits a "cmd_name" message with the canonical name of the + command, for example "status" or "checkout". + +`void trace2_cmd_mode(const char *mode)`:: + + Emits a "cmd_mode" message with a qualifier name to further + describe the current git command. ++ +This message is intended to be used with git commands having multiple +major modes. For example, a "checkout" command can checkout a new +branch or it can checkout a single file, so the checkout code could +emit a cmd_mode message of "branch" or "file". + +`void trace2_cmd_alias(const char *alias, const char **argv_expansion)`:: + + Emits an "alias" message containing the alias used and the + argument expansion. + +`void trace2_def_param(const char *parameter, const char *value)`:: + + Emits a "def_param" message containing a key/value pair. ++ +This message is intended to report some global aspect of the current +command, such as a configuration setting or command line switch that +significantly affects program performance or behavior, such as +`core.abbrev`, `status.showUntrackedFiles`, or `--no-ahead-behind`. + +`void trace2_cmd_list_config()`:: + + Emits a "def_param" messages for "important" configuration + settings. ++ +The environment variable `GIT_TR2_CONFIG_PARAMS` can be set to a +list of patterns of important configuration settings, for example: +`core.*,remote.*.url`. This function will iterate over all config +settings and emit a "def_param" message for each match. + +`void trace2_cmd_set_config(const char *key, const char *value)`:: + + Emits a "def_param" message for a specific configuration + setting IFF it matches the `GIT_TR2_CONFIG_PARAMS` pattern. ++ +This is used to hook into `git_config_set()` and catch any +configuration changes and update a value previously reported by +`trace2_cmd_list_config()`. + +`void trace2_def_repo(struct repository *repo)`:: + + Registers a repository with the Trace2 layer. Assigns a + unique "repo-id" to `repo->trace2_repo_id`. ++ +Emits a "worktree" messages containing the repo-id and the worktree +pathname. ++ +Region and data messages (described later) may refer to this repo-id. ++ +The main/top-level repository will have repo-id value 1 (aka "r1"). ++ +The repo-id field is in anticipation of future in-proc submodule +repositories. + +=== Child Process Messages + +These are concerned with the various spawned child processes, +including shell scripts, git commands, editors, pagers, and hooks. + +`void trace2_child_start(struct child_process *cmd)`:: + + Emits a "child_start" message containing the "child-id", + "child-argv", and "child-classification". ++ +Before calling this, set `cmd->trace2_child_class` to a name +describing the type of child process, for example "editor". ++ +This function assigns a unique "child-id" to `cmd->trace2_child_id`. +This field is used later during the "child_exit" message to associate +it with the "child_start" message. ++ +This function should be called before spawning the child process. + +`void trace2_child_exit(struct child_proess *cmd, int child_exit_code)`:: + + Emits a "child_exit" message containing the "child-id", + the child's elapsed time and exit-code. ++ +The reported elapsed time includes the process creation overhead and +time spend waiting for it to exit, so it may be slightly longer than +the time reported by the child itself. ++ +This function should be called after reaping the child process. + +`int trace2_exec(const char *exe, const char **argv)`:: + + Emits a "exec" message containing the "exec-id" and the + argv of the new process. ++ +This function should be called before calling one of the `exec()` +variants, such as `execvp()`. ++ +This function returns a unique "exec-id". This value is used later +if the exec() fails and a "exec-result" message is necessary. + +`void trace2_exec_result(int exec_id, int error_code)`:: + + Emits a "exec_result" message containing the "exec-id" + and the error code. ++ +On Unix-based systems, `exec()` does not return if successful. +This message is used to indicate that the `exec()` failed and +that the current program is continuing. + +=== Git Thread Messages + +These messages are concerned with Git thread usage. + +`void trace2_thread_start(const char *thread_name)`:: + + Emits a "thread_start" message. ++ +The `thread_name` field should be a descriptive name, such as the +unique name of the thread-proc. A unique "thread-id" will be added +to the name to uniquely identify thread instances. ++ +Region and data messages (described later) may refer to this thread +name. ++ +This function must be called by the thread-proc of the new thread +(so that TLS data is properly initialized) and not by the caller +of `pthread_create()`. + +`void trace2_thread_exit()`:: + + Emits a "thread_exit" message containing the thread name + and the thread elapsed time. ++ +This function must be called by the thread-proc before it returns +(so that the coorect TLS data is used and cleaned up. It should +not be called by the caller of `pthread_join()`. + +=== Region and Data Messages + +These are concerned with recording performance data +over regions or spans of code. + +`void trace2_region_enter(const char *category, const char *label, const struct repository *repo)`:: + +`void trace2_region_enter_printf(const char *category, const char *label, const struct repository *repo, const char *fmt, ...)`:: + +`void trace2_region_enter_printf_va(const char *category, const char *label, const struct repository *repo, const char *fmt, va_list ap)`:: + + Emits a thread-relative "region_enter" message with optional + printf string. ++ +This function pushes a new region nesting stack level on the current +thread and starts a clock for the new stack frame. ++ +The `category` field is an arbitrary category name used to classify +regions by feature area, such as "status" or "index". At this time +it is only just printed along with the rest of the message. It may +be used in the future to filter messages. ++ +The `label` field is an arbitrary label used to describe the activity +being started, such as "read_recursive" or "do_read_index". ++ +The `repo` field, if set, will be used to get the "repo-id", so that +recursive oerations can be attributed to the correct repository. + +`void trace2_region_leave(const char *category, const char *label, const struct repository *repo)`:: + +`void trace2_region_leave_printf(const char *category, const char *label, const struct repository *repo, const char *fmt, ...)`:: + +`void trace2_region_leave_printf_va(const char *category, const char *label, const struct repository *repo, const char *fmt, va_list ap)`:: + + Emits a thread-relative "region_leave" message with optional + printf string. ++ +This function pops the region nesting stack on the current thread +and reports the elapsed time of the stack frame. ++ +The `category`, `label`, and `repo` fields are the same as above. +The `category` and `label` do not need to match the correpsonding +"region_enter" message, but it makes the data stream easier to +understand. + +`void trace2_data_string(const char *category, const struct repository *repo, const char *key, const char * value)`:: + +`void trace2_data_intmax(const char *category, const struct repository *repo, const char *key, intmax value)`:: + +`void trace2_data_json(const char *category, const struct repository *repo, const char *key, const struct json_writer *jw)`:: + + Emits a region- and thread-relative "data" or "data_json" message. ++ +This is a key/value pair message containing information about the +current thread, region stack, and repository. This could be used +to print the number of files in a directory during a multi-threaded +recursive tree walk. + +`void trace2_printf(const char *fmt, ...)`:: + +`void trace2_printf_va(const char *fmt, va_list ap)`:: + + Emits a region- and thread-relative "printf" message. + +== Trace2 Target Formats + +=== NORMAL Format + +NORMAL format is enabled when the `GIT_TR2` environment variable is +set. + +Events are written as lines of the form: + +------------ +[