HeadSpin Documentation
Documentation

Replica DB

The HeadSpin Replica DB securely exports or "syncs" User Flow measurement data to an external Postgres database. (See Performance Monitoring UI for an overview of User Flows.) The Postgres database can be accessed directly by HeadSpin users to perform SQL queries against the raw data. On top of the Replica DB, users can create Grafana dashboards (see Grafana) or build out their own custom ETL pipelines and data visualizations using their own software stacks.

The export unit of the Replica DB is a user flow; user flows are basically a list of sessions including the aggregation of many different measurements for these sessions. Linked session data, such as data about the session's network traffic, issues, and time series, are exported into shared, auxiliary tables.

Managing the Replica DB can be done via the API or via the UI:

  • UI: Replica Export in the UI
  • API: Replica Export in the API

Replica DB Structure

Organizations that use the Replica DB are each granted readonly access to their own isolated database. This single database is shared across all user flows in the system. When exporting the data to the replica, database credentials are created and returned to the user, either via the UI or API. However, these credentials will have a limited number of available connections. As such, make sure to configure the database pool size or similar settings when configuring external tools like Grafana to properly ration these database connections.

By default, 5 simultaneous database connections are permitted; contact HeadSpin to increase this limit.

Each replica contains a schema that will have various tables, each of which has some <code class="dcode">table_type</code>. The schema can be acquired in various ways using the UI or API, but for now, note that each replica will have the tables that match some of the following table types that are shared for all user flows:

  • <code class="dcode">USER_FLOW_METADATA</code>: The user flow metadata.
  • <code class="dcode">PERF_MEASUREMENTS</code>: The performance measurements exported.
  • <code class="dcode">SESSION_METADATA</code>: The session metadata.
  • <code class="dcode">SESSION_TAGS</code>: The tags for the exported sessions.
  • <code class="dcode">SESSION_LABELS</code>: The labels for the exported sessions.
  • <code class="dcode">SESSION_TIMESERIES</code>: The timeseries metadata.
  • <code class="dcode">SESSION_TIMESERIES_DATA</code>: The (x,y) timeseries data.
  • <code class="dcode">SESSION_TIMESERIES_DATA_STATS</code>: The computed summary statistics from the label-driven time series data analysis.
  • <code class="dcode">SESSIONS</code>: The session network traffic data (Capture Network Metadata).
  • <code class="dcode">ISSUES</code>: The session issue data, as computed by HeadSpin analysis.
  • <code class="dcode">FUNCTION_CALL_METADATA</code>: Function call profile (FCP) metadata.
  • <code class="dcode">FUNCTION_CALL_PROCESS</code>: FCP process data (e.g., id, timestamps, name).
  • <code class="dcode">FUNCTION_CALL_METHOD</code>: FCP method data (e.g., id, name, process).
  • <code class="dcode">FUNCTION_CALL_THREAD</code>: FCP thread data (e.g., id, process).
  • <code class="dcode">FUNCTION_CALL_SITE</code>: FCP call site data (e.g., id, method, thread, process).

There is also a legacy table that stores the performance measurements for the relevant user flow:

  • <code class="dcode">PERF_TASK</code>: The legacy user flow export table.

This table is still exported for now, but is deprecated; this stores the performance measurements as a separate column for each measurement.

The shared tables, along with common queries on these tables, are discussed below. Note that this list is not exhaustive of the kinds of queries that are possible.

User Flow Table (<code class="dcode">user_flow</code>)

This table stores some basic metadata about the user flows that have been exported into replica. The columns consist of:

  • <code class="dcode">name (varchar(255))</code>: The name of the User Flow (as it appears in the UI).
  • <code class="dcode">user_flow_id (char(36))</code>: The unique ID of the user flow.
  • <code class="dcode">description (varchar(65535))</code>: The description of the User Fow (as it appears in the UI).
  • <code class="dcode">created_ts (timestamp)</code>: The UNIX timestamp at which the user flow was created.

This table is useful to help when JOINing across other tables, but is otherwise very simple.

Session Metadata Table (<code class="dcode">session_metadata</code>)

This table stores the basic metadata common to all sessions in the system, in particular:

  • <code class="dcode">session_id (char(36))</code>: The unique session ID, used in many places.
  • <code class="dcode">user_flow_id (char(36))</code>: The user flow this session pertains to.
  • <code class="dcode">hostname (varchar(255))</code>, <code class="dcode">host_country (varchar(255))</code>, <code class="dcode">host_city (varchar(255)</code>), <code class="dcode">host_lat</code> (<code class="dcode">double precision</code>), <code class="dcode">host_lng</code> (<code class="dcode">double precision</code>): The session's host information.
  • <code class="dcode">device_id (char(50))</code>: The ID of the device used. Note: the unique ID for any HeadSpin device is <code class="dcode">{device_id}@{hostname}</code> which is known as the "device address".
  • <code class="dcode">device_model (varchar(255))</code>: The device-reported model name.
  • <code class="dcode">device_sku (varchar(255))</code>: The SKU of the device in which the session was captured on.
  • <code class="dcode">device_carrier (varchar(255))</code>: The name(s) of any network carrier(s) associated with SIM cards on the device. Carriers are separated using commas. One or more commas indicates one or more SIM cards on the device. Commas with no carriers indicates multiple SIM cards available but not connected to a carrier.
  • <code class="dcode">device_os_version (varchar(20))</code>: The version of the operating system on the device.
  • <code class="dcode">host_os (varchar(255))</code>: The operating system of the host associated with a device. For browsers this is the operating system of the device the browser is running on.
  • <code class="dcode">host_os_version (varchar(255))</code>: The operating system version of the host associated with a device. For browsers this is the operating system version of the device the browser is running on.
  • <code class="dcode">session_link (varchar(255))</code>: Waterfall UI link to the corresponding session.
  • <code class="dcode">name (varchar(255))</code>: A meaningful name assigned to the session using session annotation API.
  • <code class="dcode">description (varchar(65535))</code>: A description assigned to the session using session annotation API.
  • <code class="dcode">status (char(10))</code>: The functional status of this session in the user flow (one of: <code class="dcode">Passed</code>, <code class="dcode">Failed</code>, or <code class="dcode">Excluded</code>). <code class="dcode">Passed</code> and <code class="dcode">Failed</code> indicate the functional test success or failure, respectively. <code class="dcode">Excluded</code> indicates an error or issue occurred on the HeadSpin side. This field is set by the user when adding a session to a user flow.
  • <code class="dcode">start_time (timestamp)</code>: The timestamp of when the session was requested to start.
  • <code class="dcode">end_time (timestamp)</code>: The timestamp of when the session was requested to end.

A common query to list all of the sessions for a user flow might be:


select sm.session_id FROM session_metadata AS sm
JOIN user_flow AS uf ON sm.user_flow_id = uf.user_flow_id
WHERE uf.name = '<user flow name>'

Performance Measurements (<code class="dcode">performance_measurements</code>)

This table stores all of the performance measurements calculated for each session. A "performance measurement" is a key, value pair typically representing the name and value of an aggregated metric summarizing some aspect of the session data. The columns are as follows:

  • <code class="dcode">session_id (char(36)): The unique session ID the measurement pertains to.
  • <code class="dcode">key_name (varchar(128)): The (user-facing) name of the performance measurement.
  • <code class="dcode">key_type (varchar(128)): The type of measurement (i.e., <code class="dcode">HeadSpin Issue Impact</code>).
  • <code class="dcode">key_units (varchar(128)): The units for the measurement.
  • <code class="dcode">str_value (varchar(512))</code>: The measurement value (cast as a string).
  • <code class="dcode">float_value (double precision)</code>: The measurement value (cast as a float).

Note that both the <code class="dcode">str_value</code> and <code class="dcode">float_value</code> are provided to handle both numeric and non-numeric measurement types; the <code class="dcode">str_value</code> field should always be filled in, but the <code class="dcode">float_value</code> may be <code class="dcode">NULL</code> if the type is not numeric.

A possible query to list some performance measurement for the 10 most recent sessions in a user flow might look like:


SELECT pm.session_id, pm.key_name, pm.key_type, pm.key_units, pm.str_value, pm.float_value
FROM performance_measurements AS pm
JOIN session_metadata AS sm ON pm.session_id = sm.session_id
WHERE sm.user_flow_id = '<user_flow_id>' AND pm.key_name = '<key_name>'
ORDER BY sm.start_time DESC
LIMIT 10

Session Tags (<code class="dcode">session_tags</code>)

This table stores all of the session tags for all of the exported sessions. The session tags are a list of <code class="dcode">key, value</code> pairs that are attached to sessions. Each session tag has:

  • <code class="dcode">session_id (char(36))</code>: The unique session ID the tag is associated with.
  • <code class="dcode">tag_key (varchar(64))</code>: The key for the tag -- usually a descriptive name.
  • <code class="dcode">tag_value (varchar(128))</code>: The value for the tag -- usually some specific data or metadata about the session.

A possible query to list tags for the 10 most recent sessions in a user flow might be:


SELECT st.session_id, st.tag_key, st.tag_value
FROM session_tags AS st
JOIN session_metadata AS sm ON st.session_id = sm.session_id
WHERE sm.user_flow_id = '<user_flow_id>'
ORDER BY sm.start_time DESC
LIMIT 10

Session Labels (<code class="dcode">session_labels</code>)

This table stores all of the session labels for all of the exported sessions. Each session label has a variety of fields that can be queried against. Many organizations use labels to identify or flag portions of a session they are interested in, so this data can be varied. This table exports all labels for richer queries. The session_labels table includes:

  • <code class="dcode">session_id (char(36))</code>: The unique session ID the label is associated with.
  • <code class="dcode">label_id (char(36))</code>: The unique ID of the label.
  • <code class="dcode">label_type (varchar(256))</code>: Metadata regarding the intended purpose of the label. The default type is <code class="dcode">user</code>, although other types are available. See the documentation on Annotating Your Session for the available label types and their usage.
  • <code class="dcode">name (varchar(1024))</code>: The name of the label. This is a required field for any label and is typically the most specific description of the event the label represents.
  • <code class="dcode">category (varchar(1024))</code>: Optional category description of the label. Label categories are customizable descriptions that enable grouping semantically similar labels.
  • <code class="dcode">start_time (float)</code>: The start time (in units of milliseconds) for the label relative to the session start time.
  • <code class="dcode">end_time</code>: The end time (in units of milliseconds) for the label relative to the session start time. The end time may be equal to start_time to denote an instantaneous point of time.
  • <code class="dcode">data (varchar(65535))</code>: Any data that was attached to the label. Data may be arbitrary strings, but is often valid JSON.
  • <code class="dcode">label_group_id (char(36))</code>: Optional label group ID that the label is associated with.
  • <code class="dcode">video_box (varchar(65535))</code>: Optional coordinates of bounding boxes provided as nested arrays in the format <code class="dcode">[[x0, y0, x1, y1], ...]</code>.

A possible query to list the average duration of some label for the 10 most recent sessions in the user flow might look like:


SELECT sl.session_id, AVG(sl.end_time - sl.start_time) AS avg_duration
FROM session_labels AS sl
WHERE sl.session_id IN (
    SELECT DISTINCT session_id FROM session_metadata
    WHERE user_flow_id = '<user_flow_id>'
    ORDER BY start_time DESC
    LIMIT 10
) AND sl.name = 'home'
GROUP BY sl.session_id
ORDER BY avg_duration

Session Timeseries (<code class="dcode">session_timeseries</code> and <code class="dcode">session_timeseries_data</code>)

For the session timeseries, there are two primary tables to store the data. In practice, these tables are usually the largest, so efficient queries here should restrict the ranges of values to parse here as much as possible.

The <code class="dcode">session_timeseries</code> table stores metadata about the time series:

  • <code class="dcode">session_id (char(36))</code>: The unique session ID this time series is associated with.
  • <code class="dcode">timeseries_key (varchar(512))</code>: A machine-accessible name for the time series used in the <code class="dcode">session_timeseries_data</code> table and time series REST APIs.
  • <code class="dcode">timeseries_name (varchar(512))</code>: A human-readable name of the time series.
  • <code class="dcode">x_units (varchar(512))</code>: The units for the x-values of the time series. Typically time since session capture start in units of milliseconds.
  • <code class="dcode">y_units (varchar(512))</code>: The units for the y-values of the time series data.

The <code class="dcode">session_timeseries_data</code> table stores the individual time series data points:

  • <code class="dcode">session_id</code> (<code class="dcode">char(36)</code>): The unique session ID the data point is associated with.
  • <code class="dcode">timeseries_key</code> (<code class="dcode">varchar(512)</code>): A machine-accessible name for the time series used in the <code class="dcode">session_timeseries</code> table and time series REST APIs.
  • <code class="dcode">x_value</code> (<code class="dcode">float</code>): The x-value of the time series data point. This is typically milliseconds since session capture start.
  • <code class="dcode">y_value</code> (<code class="dcode">float</code>): The y-value of the time series data point.

In summary, the <code class="dcode">session_timeseries</code> table stores the details about what the data in the <code class="dcode">session_timeseries_data</code> table represents. A query to fetch some timeseries for the top 5 most recent sessions in a user flow might be:


SELECT sd.session_id, sd.x_value, sd.y_value, st.x_units, st.y_units
FROM session_timeseries_data AS sd
JOIN session_timeseries AS st
    ON sd.session_id = st.session_id AND sd.timeseries_key = st.timeseries_key
WHERE sd.session_id IN (
    SELECT DISTINCT session_id FROM session_metadata
    WHERE user_flow_id = '<user_flow_id>'
    ORDER BY start_time DESC
    LIMIT 5
) AND st.timeseries_name = '<timeseries_name>'
ORDER BY session_id, x_value

Session Timeseries Data Stats (<code class="dcode">session_timeseries_data_stats</code>)

This table stores the computed summary statistics from the label-driven time series data analysis (see the "Summary Statistics" time series data analysis in the Session Analysis API documentation).

The table has the following schema:

  • <code class="dcode">session_id</code> (<code class="dcode">char(36)</code>): The unique session ID the summary statistic is associated with.
  • <code class="dcode">label_group_id</code> (<code class="dcode">char(36)</code>): The unique label group ID the summary statistic is associated with.
  • <code class="dcode">timeseries_key</code> (<code class="dcode">varchar(512)</code>): A machine-accessible name for the time series on which for the summary statistic was computed.
  • <code class="dcode">metric</code> (<code class="dcode">varchar(512)</code>): The name of the summary statistic (e.g., mean).
  • value (double precision): The value of the summary statistic.
  • <code class="dcode">time_intervals_ms</code> (<code class="dcode">varchar(65535)</code>): The ordered, nonoverlapping time intervals (in milliseconds) on which the summary statistic was computed.

An example query is shown below where the computed summary statistics (for <code class="dcode">video_quality_mos</code> time series) and their time intervals (on which the summary statistics are computed) are fetched from sessions that belong in a certain user flow (with id <code class="dcode">'&lt;user_flow_id>'</code>) and were captured since <code class="dcode">&lt;start_time></code>.


SELECT sm.session_id, stds.metric, stds.value, sl.start_time, sl.end_time
FROM session_metadata AS sm
INNER JOIN session_timeseries_data_stats AS stds
    ON sm.session_id = stds.session_id
INNER JOIN session_labels AS sl
    ON stds.label_group_id = sl.label_group_id
WHERE sm.user_flow_id = '<user_flow_id>'
    AND stds.timeseries_key = 'video_quality_mos'
    AND sm.start_time > '<start_time>'

Capture Issues (<code class="dcode">capture_issues</code>)

This table stores the exported HeadSpin Issues identified in the session analysis. This table includes:

  • <code class="dcode">session_id</code> (<code class="dcode">char(36)</code>): The unique session ID the Issue is associated with.
  • <code class="dcode">issue_id</code> (<code class="dcode">char(36)</code>): The ID of the associated Issue. An issue_id is guaranteed to be unique per <code class="dcode">session_id</code> and <code class="dcode">category</code>.
  • <code class="dcode">impact_id</code> (<code class="dcode">char(36)</code>): The ID of the associated Issue Impact. An Issue Impact is an event on the session timeline demarcating where the user flow was affected by an Issue identified by the HeadSpin analysis. An <code class="dcode">impact_id</code> is guaranteed to be unique per <code class="dcode">session_id</code>, <code class="dcode">category</code>, and <code class="dcode">issue_id</code>.
  • <code class="dcode">message_id</code> (<code class="dcode">bigint</code>): The network message ID associated with a particular Issue Impact if the Issue is associated with network traffic. The network <code class="dcode">message_id</code> may be used to JOIN on the <code class="dcode">capture_sessions</code> table to query info about the affected network exchange(s). A <code class="dcode">message_id</code> is guaranteed to be unique per <code class="dcode">session_id</code>.
  • <code class="dcode">category</code> (<code class="dcode">varchar(255)</code>): The name of the Issue card as presented in the Waterfall UI or documented in the HeadSpin Issue documentation.
  • <code class="dcode">group</code> (<code class="dcode">varchar(16)</code>): The issue distinguished into four types as documented in the HeadSpin Issue documentation.
  • <code class="dcode">name</code> (<code class="dcode">varchar(65535)</code>): The host name of the affected server involved in the network message exchange (identified in the <code class="dcode">message_id</code> column) for applicable network-based Issue cards. This is <code class="dcode">NULL</code> for non-network-based Issues.
  • <code class="dcode">impact_time</code>: The amount of time in units of milliseconds that the session was affected by the corresponding HeadSpin Issue. If the <code class="dcode">reference</code> column is <code class="dcode">true</code>, the impact time should be zero. Zero impact time indicates that this Issue Impact was the first (or "reference") event in a chain of events that resulted in suboptimal performance. For example, for the <code class="dcode">Duplicate Messages</code> Issue card, the first observed message is marked as a <code class="dcode">reference</code> impact and all following observations of this exact message are considered issues.
  • <code class="dcode">start_time_ms</code> (<code class="dcode">bigint</code>): The start time of the Issue Impact relative to session capture start in units of milliseconds.
  • <code class="dcode">end_time_ms</code> (<code class="dcode">bigint</code>): The end time of the Issue Impact relative to session capture start in units of milliseconds.
  • <code class="dcode">version</code> (<code class="dcode">varchar(16)</code>): The version of the HeadSpin Issue analysis used to derive this issue.
  • <code class="dcode">reference</code> (<code class="dcode">boolean</code>): When true, indicates that the Impact is not considered to adversely affect performance.

This table stores the different HeadSpin Issues and their corresponding Impacts (see the HeadSpin Issues documentation for details), and can be useful to find Issues that are most responsible for a poor user experience or suboptimal performance. For example, a query to list the Issues with the largest impact time for the most recent 10 sessions in some user flow might be:


SELECT ci.session_id, ci.category, ci.name, ci.impact_time
FROM capture_issues AS ci
WHERE (ci.session_id, ci.impact_time) IN (
    SELECT session_id, MAX(impact_time) AS max_impact_time
    FROM capture_issues AS issues
    WHERE session_id IN (
        SELECT session_id FROM session_metadata AS sm
        WHERE sm.user_flow_id = '<user_flow_id>'
        ORDER BY sm.start_time DESC
        LIMIT 10
    )
    GROUP BY session_id
)

Capture Network Metadata (<code class="dcode">capture_sessions</code>)

This table contains metadata about the captured network message exchanges for the sessions in the user flow. This table typically stores captured network data for HTTP/HTTPS connections along with some extracted information for the requests. This table includes:

  • <code class="code">session_id</code> (<code class="dcode">char(36)</code>): The unique session ID the message is associated with.
  • <code class="dcode">message_id</code> (<code class="dcode">bigint</code>): The network exchange message ID. The <code class="dcode">network message_id</code> may be used to JOIN on the <code class="dcode">capture_issues</code> table to query info about any HeadSpin Issues associated with particular network exchanges. A <code class="dcode">message_id</code> is guaranteed to be unique per <code class="dcode">session_id</code>.
  • <code class="dcode">start_time</code> (<code class="dcode">timestamp</code>): The timestamp the request started.
  • <code class="dcode">end_time</code> (<code class="dcode">timestamp</code>): The timestamp the request ended.
  • <code class="dcode">duration</code> (<code class="dcode">bigint</code>): The duration for the request, response exchange to complete in units of milliseconds.
  • <code class="dcode">tcp_connect</code> (<code class="dcode">bigint</code>): The time taken to complete the TCP handshake in units of milliseconds.
  • <code class="dcode">tls_handshake</code> (<code class="dcode">bigint</code>): The time to complete the TLS handshake in units of milliseconds.
  • <code class="dcode">wait</code> (<code class="dcode">bigint</code>): The amount of time spent waiting for a response (time to first byte) in units of milliseconds.
  • <code class="dcode">receive</code> (<code class="dcode">bigint</code>): The amount of time spent downloading the response in units of milliseconds.
  • <code class="dcode">status</code> (<code class="dcode">bigint</code>): The HTTP status code of the response.
  • <code class="dcode">content_type</code> (<code class="dcode">varchar(512)</code>): The content type sent in the response as given in the HTTP header.
  • <code class="dcode">request_size</code> (<code class="dcode">bigint</code>): The size of the request in bytes.
  • <code class="dcode">response_size</code> (<code class="dcode">bigint</code>): The size of the response in bytes.
  • <code class="dcode">url</code> (<code class="dcode">varchar(65535)</code>): The URL for the request.

These fields are useful for examining HTTP requests made by the application and identifying any slow requests. For example, a query to find the longest request for each of the 10 most recent sessions,(as determined by the <code class="dcode">duration</code> field) might be:


SELECT cs.session_id, cs.duration, cs.url
FROM capture_sessions AS cs
WHERE (cs.session_id, cs.duration) IN (
    SELECT network.session_id, MAX(network.duration) AS max_duration
    FROM capture_sessions AS network
    JOIN session_metadata AS sm ON network.session_id = sm.session_id
    WHERE sm.session_id IN (
        SELECT DISTINCT session_id FROM session_metadata
        WHERE user_flow_id = '1c89aa46-0807-11ea-bee9-02db88d209ba'
        ORDER BY start_time DESC
        LIMIT 10
    )
    GROUP BY network.session_id
)

Function Call Profile Tables

Depending on the configuration of the app and/or device under test, some sessions will capture function call profile data (this is the data used to render the flame graph in the Waterfall UI). This data is exported in a normalized form to a series of tables that have the following schema:

  • function_call_metadata: Metadata pertaining to the function call capture tooling, including:

    • session_id (char(36)): The unique session ID the profile data is associated with.
    • version (varchar(32)): The schema version.
    • platform (varchar(32)): The device platform (i.e., Android or iOS).
    • has_cpu_time (boolean): Whether or not CPU time metrics are available.
    • start_time_us (bigint): The profile start time in microseconds since session start. This is typically zero.
    • end_time_us (bigint): The profile end time in microseconds since session start.
  • function_call_process: Process level information, including:

    • session_id (char(36)): The unique session ID this data is associated with.
    • id (bigint): The process ID. A process ID is guaranteed to be unique per session_id.
    • name (varchar(512)): The process name.
    • start_time_us (bigint): The process start time in microseconds since session start.
  • function_call_method: Method information, including:

    • session_id (char(36)): The unique session ID this data is associated with.
    • process_id (bigint): The process ID the method was called in.
    • id (bigint): The method ID. A method ID is guaranteed to be unique per session_id and process_id.
    • raw_name (varchar(512)): The raw method signature. For Android, the semantics are defined here. The documentation for iOS is not currently available.
    • name (varchar(256)): The human-readable function name. Not available for iOS.
    • args (varchar(256)): The human-readable function argument types. Not available for iOS.
    • return (varchar(256)): The human readable function return types. Not available for iOS.
    • signature (varchar(65535)): The full human readable method signature. Not available for iOS.
    • package_name (varchar(512)): The method's package name.
  • function_call_thread: Thread-level information, including:

    • session_id (char(36)): The unique session ID this data is associated with.
    • process_id (bigint): The process id that created the thread.
    • id (bigint): The thread ID. Thread IDs are unique at the process level; queries should specify the session_id, process_id, and thread_id.
    • name: The thread name, if available.
    • flag: The thread flag, if available. Defaults to 0 (0x01 system thread).
    • start_time_us: The thread start time in microseconds since session start.
    • end_time_us: The thread end time in microseconds since session start.
  • function_call_site: Call site information, including:

    • session_id (char(36)): The unique session ID this data is associated with.
    • process_id (bigint): The process ID that created the thread.
    • thread_id (bigint): The thread ID.
    • method_id (bigint): The call's method ID.
    • id (bigint): The call site ID. Call IDs are unique at the thread level; queries should specify the session_id, process_id, thread_id, and call_id.
    • start_time_us (bigint): The call start time (in microseconds) since session start (this is wall/clock time).
    • end_time_us (bigint): The call end time (in microseconds) since session start (this is wall/clock time).
    • cpu_time_us (bigint): The CPU time (in microseconds) spent in the call.
    • parent_call_id (bigint): The ID of the parent call site.
    • depth (bigint): The depth of the call in the hierarchical call tree. The depth for the root level calls is by definition NULL; the child calls of the root level calls have a depth of 1, the children of those calls have a depth of 2, and so on.

These tables store the function call profile data in a normalized manner; in general you will want to join on <code class="dcode">session_id</code> as well as <code class="dcode">process_id</code> and <code class="dcode">thread_id</code> where appropriate. Note that method IDs are unique at the process level, so you can join a call site to a method by joining on (<code class="dcode">session_id</code>, <code class="dcode">process_id</code>, and <code class="dcode">method_id</code>). Below are some example queries to demonstrate how one might interact with this tables. For the example queries that follow, assume the session <code class="dcode">d3bf5f07-e910-11eb-a576-acde48001122</code> has function call data that you'd like to query against.

Suppose you want to select all function call profile data captures longer than 80 seconds:


SELECT *
FROM function_call_metadata
WHERE end_time_us - start_time_us > 80000000;

Suppose you want to select all call sites from session <code class="dcode">d3bf5f07-e910-11eb-a576-acde48001122</code> called from a process named "some.known.process.name" that spent more than 50 ms of CPU time:


SELECT *
FROM
    function_call_site c
INNER JOIN  function_call_process p ON
    c.session_id = p.session_id AND
    c.process_id = p.id
WHERE
    p.name = 'some.known.process.name' AND
    c.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
    c.cpu_time_us > 50000
ORDER by c.cpu_time_us DESC;

Suppose you want to find the deepest call site in each session:


SELECT session_id, MAX(depth) as max_depth
FROM function_call_site
GROUP BY session_id
ORDER BY max_depth DESC;

Suppose you want to get the thread names in each session:


SELECT session_id, name
FROM function_call_thread
GROUP BY session_id, name
ORDER BY name;

Suppose you want to get the time spent in each thread in a particular session:


SELECT p.id, p.name, t.id, t.name, (t.end_time_us - t.start_time_us) as thread_time
FROM function_call_thread t
INNER JOIN function_call_process p ON
    t.session_id = p.session_id AND
    t.process_id = p.id
WHERE t.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122';

Suppose you want to select all calls in session <code class="dcode">d3bf5f07-e910-11eb-a576-acde48001122</code> that have a method <code class="dcode">signature</code> matching "java.lang.reflect", and then sum the amount of time spent in those calls:


SELECT m.raw_name, m.process_id, SUM(c.wall_time)
FROM (
    SELECT *, (c.end_time_us - c.start_time_us) as wall_time
    FROM function_call_site c
) AS c
INNER JOIN function_call_method m ON
    c.session_id = m.session_id AND
    c.process_id = m.process_id AND
    c.method_id = m.id
WHERE
    c.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
    m.raw_name LIKE '%java.lang.reflect%'
GROUP BY m.raw_name, m.process_id;

The individual call sites are stored in the <code class="dcode">function_call_site</code> table and have a hierarchical/nested relationship. If call site <code class="dcode">a</code> calls call site <code class="dcode">b</code>, then call site <code class="dcode">b</code> is stored as a row with <code class="dcode">parent_call_id</code> corresponding to the id of call site <code class="dcode">a</code> (this is otherwise known as an "Adjacency List"). There are a number of useful queries that the "Adjacency List" pattern supports. Suppose you want to find the root call sites across all processes and threads in this session:


SELECT id
FROM function_call_site
WHERE
    session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
    parent_call_id IS NULL;

Suppose you want to find the immediate child calls of some particular call site with <code class="dcode">id</code> 123456789. This can be done with:


SELECT *
FROM function_call_site
WHERE
    session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
    parent_call_id = 123456789;

Note however that this does not traverse the hierarchy to arbitrary depth; we'll cover that in a later example. Suppose you want to find all the "leaf" nodes (i.e., call sites with no child calls). This can be done with:


SELECT *
FROM function_call_site c1
LEFT JOIN function_call_site c2 ON c2.parent_call_id = c1.id
WHERE
    c1.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
    c2.id is NULL;

The database engines used for Replica DB all support recursive common table expressions (CTEs), which are useful for traversing these trees for more practical use cases involving hierarchical data. For instance, suppose you wanted to retrieve the call sites that were called under the call site with the <code class="dcode">id</code> 4696066816, while constructing the method path for each call site (i.e., a top-down traversal from some particular node). This can be done with:


WITH RECURSIVE CallTree
    (id, start_time_us, end_time_us, method_path)
AS (
    SELECT c.id, c.start_time_us, c.end_time_us, m.signature as method_path
    FROM function_call_site c
    INNER JOIN function_call_method m ON
        (c.session_id = m.session_id AND c.process_id = m.process_id AND c.method_id = m.id)
    WHERE
        c.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
        c.id = 4696066816
  UNION ALL
    SELECT c.id, c.start_time_us, c.end_time_us, ct.method_path || ' > ' || m.signature
    FROM CallTree ct
    INNER JOIN function_call_site c ON (ct.id = c.parent_call_id)
    INNER JOIN function_call_method m ON
        (c.session_id = m.session_id AND c.process_id = m.process_id AND c.method_id = m.id)
    WHERE c.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122'
)
SELECT * FROM CallTree;

Alternatively, supposed you wanted to retrieve the chain of call sites that ultimately led to the call site with <code class="dcode">id</code> 4701240496 being called (i.e., a bottom up traversal from some particular node to its corresponding root node):


WITH RECURSIVE CallTree
    (id, start_time_us, end_time_us, parent_call_id)
AS (
    SELECT c.id, c.start_time_us, c.end_time_us, parent_call_id
    FROM function_call_site c
    WHERE
        c.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122' AND
        c.id = 4701240496
  UNION ALL
    SELECT c.id, c.start_time_us, c.end_time_us, c.parent_call_id
    FROM CallTree ct
    INNER JOIN function_call_site c ON (ct.parent_call_id = c.id)
    WHERE c.session_id = 'd3bf5f07-e910-11eb-a576-acde48001122'
)
SELECT * FROM CallTree;

Legacy Table (User-Flow-Specific Table)

For historic reasons, HeadSpin also exports a custom table for each user flow that is exported to Replica. The name of the table is generated from the name of the User Flow and this table has a different column for each Performance Measurement found among sessions in the User Flow. In practice, however, querying this table is hard because it requires knowing the schema and which measurements are actually valid for the User Flow beforehand.

As such, this table is marked as deprecated; all of the same information can be returned by querying the shared tables above as needed. For example, to list all of the (distinct) Performance Measurements for all of the sessions in the User Flow, use the following query:


SELECT DISTINCT pm.key_name, pm.key_type, pm.key_units
FROM performance_measurements AS pm
JOIN session_metadata AS sm ON pm.session_id = sm.session_id
WHERE sm.user_flow_id = '<user_flow_id>'

If a session does not have some performance measurement, then that entry will not exist in the <code class="dcode">performance_measurements</code> table. The table currently is still exported while some customer work flows use it, but it should not be relied upon.