Main Summary
Introduction
The main_summary table is the most direct representation of a main ping
but can be difficult to work with due to its size.
Prefer the longitudinal dataset unless using the sampled data is prohibitive.
Contents
The main_summary table contains one row for each ping.
Each column represents one field from the main ping payload,
though only a subset of all main ping fields are included.
This dataset does not include histograms.
Background and Caveats
This table is massive, and due to its size, it can be difficult to work with.
You should avoid querying main_summary from re:dash.
Your queries will be slow to complete and can impact performance for other users,
since re:dash on a shared cluster.
Instead, we recommend using the longitudinal or clients_daily dataset where possible.
If these datasets do not suffice, consider using Spark on an
ATMO cluster.
In the odd case where these queries are necessary,
make use of the sample_id field and limit to a short submission date range.
Accessing the Data
The data is stored as a parquet table in S3 at the following address. See this cookbook to get started working with the data in Spark.
s3://telemetry-parquet/main_summary/v4/
Though not recommended main_summary is accessible through re:dash.
Here's an example query.
Your queries will be slow to complete and can impact performance for other users,
since re:dash is on a shared cluster.
Further Reading
The technical documentation for main_summary is located in the
telemetry-batch-view documentation.
The code responsible for generating this dataset is here
Adding New Fields
We support a few basic types that can be easily added to main_summary.
Non-addon scalars are automatically added to main_summary.
User Preferences
These are added in the userPrefsList, near the top of the
Main Summary file.
They must be available in the ping environment
to be included here. There is more information in the file itself.
Once added, they will show as top-level fields, with the string user_pref prepended. For example, IntegerUserPref("dom.ipc.processCount") becomes user_pref_dom_ipc_processcount.
Histograms
Histograms can simply be added to the histogramsWhitelist near the top of
Main Summary file.
Simply add the name of the histogram in the alphabetically-sorted position in the list.
Each process a histogram is recorded in will have a column in main_summary, with the string histogram_ prepended. For example, CYCLE_COLLECTOR_MAX_PAUSE
is recorded in the parent, content, and gpu processes (according to the definition).
It will then result in three columns:
histogram_parent_cycle_collector_max_pausehistogram_content_cycle_collector_max_pausehistogram_gpu_cycle_collector_max_pause
Addon Scalars
Addon scalars are recorded by an addon. To include one of these, add the definition to the addon scalars definition file in telemetry-batch-view. Be sure to include the section:
record_in_processes:
- 'dynamic'
The addon scalars can then be found in the associated column, depending on their type:
string_addon_scalarskeyed_string_addon_scalarsuint_addon_scalarskeyed_uint_addon_scalarsboolean_addon_scalarskeyed_boolean_addon_scalars
These columns are all maps. Each addon scalar will be a key within that map, concatenating the top-level subsection within Scalars.yaml with its name to get the key. As an example, consider the following scalar definition:
test:
misunderestimated_nucular:
description: A test scalar, no soup for you!
expires: never
kind: string
keyed: true
notification_emails:
- frank@mozilla.com
record_in_processes:
- 'dynamic'
For example, you could find the addon scalar test.misunderestimated_nucular, a keyed string scalar, using keyed_string_addon_scalars['test_misunderestimated_nucular'].
In general, use element_at, which returns NULL when the key is not found: element_at(keyed_string_addon_scalars, 'test_misunderestimated_nucular')
Other Fields
We can include other types of fields as well, for example if there needs to be a specific transformation done. We do need the data to be available in the Main Ping
Data Reference
Example Queries
We recommend working with this dataset via Spark rather than sql.t.m.o.
Due to the large number of records,
queries can consume a lot of resources on the
shared cluster and impact other users.
Queries via sql.t.m.o should limit to a short submission_date_s3 range,
and ideally make use of the sample_id field.
When using Presto to query the data from sql.t.m.o,
you can use the UNNEST feature to access items in the
search_counts, popup_notification_stats and active_addons fields.
For example, to compare the search volume for different search source values, you could use:
WITH search_data AS (
SELECT
s.source AS search_source,
s.count AS search_count
FROM
main_summary
CROSS JOIN UNNEST(search_counts) AS t(s)
WHERE
submission_date_s3 = '20160510'
AND sample_id = '42'
AND search_counts IS NOT NULL
)
SELECT
search_source,
sum(search_count) as total_searches
FROM search_data
GROUP BY search_source
ORDER BY sum(search_count) DESC
Sampling
The main_summary dataset contains one record for each main ping
as long as the record contains a non-null value for
documentId, submissionDate, and Timestamp.
We do not ever expect nulls for these fields.
Scheduling
This dataset is updated daily via the telemetry-airflow infrastructure. The job DAG runs every day shortly after midnight UTC. You can find the job definition here
Schema
As of 2017-12-03, the current version of the main_summary dataset is v4, and has a schema as follows:
root
|-- document_id: string (nullable = false)
|-- client_id: string (nullable = true)
|-- channel: string (nullable = true)
|-- normalized_channel: string (nullable = true)
|-- normalized_os_version: string (nullable = true)
|-- country: string (nullable = true)
|-- city: string (nullable = true)
|-- geo_subdivision1: string (nullable = true)
|-- geo_subdivision2: string (nullable = true)
|-- os: string (nullable = true)
|-- os_version: string (nullable = true)
|-- os_service_pack_major: long (nullable = true)
|-- os_service_pack_minor: long (nullable = true)
|-- windows_build_number: long (nullable = true)
|-- windows_ubr: long (nullable = true)
|-- install_year: long (nullable = true)
|-- is_wow64: boolean (nullable = true)
|-- memory_mb: integer (nullable = true)
|-- cpu_count: integer (nullable = true)
|-- cpu_cores: integer (nullable = true)
|-- cpu_vendor: string (nullable = true)
|-- cpu_family: integer (nullable = true)
|-- cpu_model: integer (nullable = true)
|-- cpu_stepping: integer (nullable = true)
|-- cpu_l2_cache_kb: integer (nullable = true)
|-- cpu_l3_cache_kb: integer (nullable = true)
|-- cpu_speed_mhz: integer (nullable = true)
|-- gfx_features_d3d11_status: string (nullable = true)
|-- gfx_features_d2d_status: string (nullable = true)
|-- gfx_features_gpu_process_status: string (nullable = true)
|-- gfx_features_advanced_layers_status: string (nullable = true)
|-- apple_model_id: string (nullable = true)
|-- antivirus: array (nullable = true)
| |-- element: string (containsNull = false)
|-- antispyware: array (nullable = true)
| |-- element: string (containsNull = false)
|-- firewall: array (nullable = true)
| |-- element: string (containsNull = false)
|-- profile_creation_date: long (nullable = true)
|-- profile_reset_date: long (nullable = true)
|-- previous_build_id: string (nullable = true)
|-- session_id: string (nullable = true)
|-- subsession_id: string (nullable = true)
|-- previous_session_id: string (nullable = true)
|-- previous_subsession_id: string (nullable = true)
|-- session_start_date: string (nullable = true)
|-- subsession_start_date: string (nullable = true)
|-- session_length: long (nullable = true)
|-- subsession_length: long (nullable = true)
|-- subsession_counter: integer (nullable = true)
|-- profile_subsession_counter: integer (nullable = true)
|-- creation_date: string (nullable = true)
|-- distribution_id: string (nullable = true)
|-- submission_date: string (nullable = false)
|-- sync_configured: boolean (nullable = true)
|-- sync_count_desktop: integer (nullable = true)
|-- sync_count_mobile: integer (nullable = true)
|-- app_build_id: string (nullable = true)
|-- app_display_version: string (nullable = true)
|-- app_name: string (nullable = true)
|-- app_version: string (nullable = true)
|-- timestamp: long (nullable = false)
|-- env_build_id: string (nullable = true)
|-- env_build_version: string (nullable = true)
|-- env_build_arch: string (nullable = true)
|-- e10s_enabled: boolean (nullable = true)
|-- e10s_multi_processes: long (nullable = true)
|-- locale: string (nullable = true)
|-- update_channel: string (nullable = true)
|-- update_enabled: boolean (nullable = true)
|-- update_auto_download: boolean (nullable = true)
|-- attribution: struct (nullable = true)
| |-- source: string (nullable = true)
| |-- medium: string (nullable = true)
| |-- campaign: string (nullable = true)
| |-- content: string (nullable = true)
|-- sandbox_effective_content_process_level: integer (nullable = true)
|-- active_experiment_id: string (nullable = true)
|-- active_experiment_branch: string (nullable = true)
|-- reason: string (nullable = true)
|-- timezone_offset: integer (nullable = true)
|-- plugin_hangs: integer (nullable = true)
|-- aborts_plugin: integer (nullable = true)
|-- aborts_content: integer (nullable = true)
|-- aborts_gmplugin: integer (nullable = true)
|-- crashes_detected_plugin: integer (nullable = true)
|-- crashes_detected_content: integer (nullable = true)
|-- crashes_detected_gmplugin: integer (nullable = true)
|-- crash_submit_attempt_main: integer (nullable = true)
|-- crash_submit_attempt_content: integer (nullable = true)
|-- crash_submit_attempt_plugin: integer (nullable = true)
|-- crash_submit_success_main: integer (nullable = true)
|-- crash_submit_success_content: integer (nullable = true)
|-- crash_submit_success_plugin: integer (nullable = true)
|-- shutdown_kill: integer (nullable = true)
|-- active_addons_count: long (nullable = true)
|-- flash_version: string (nullable = true)
|-- vendor: string (nullable = true)
|-- is_default_browser: boolean (nullable = true)
|-- default_search_engine_data_name: string (nullable = true)
|-- default_search_engine_data_load_path: string (nullable = true)
|-- default_search_engine_data_origin: string (nullable = true)
|-- default_search_engine_data_submission_url: string (nullable = true)
|-- default_search_engine: string (nullable = true)
|-- devtools_toolbox_opened_count: integer (nullable = true)
|-- client_submission_date: string (nullable = true)
|-- client_clock_skew: long (nullable = true)
|-- client_submission_latency: long (nullable = true)
|-- places_bookmarks_count: integer (nullable = true)
|-- places_pages_count: integer (nullable = true)
|-- push_api_notify: integer (nullable = true)
|-- web_notification_shown: integer (nullable = true)
|-- popup_notification_stats: map (nullable = true)
| |-- key: string
| |-- value: struct (valueContainsNull = true)
| | |-- offered: integer (nullable = true)
| | |-- action_1: integer (nullable = true)
| | |-- action_2: integer (nullable = true)
| | |-- action_3: integer (nullable = true)
| | |-- action_last: integer (nullable = true)
| | |-- dismissal_click_elsewhere: integer (nullable = true)
| | |-- dismissal_leave_page: integer (nullable = true)
| | |-- dismissal_close_button: integer (nullable = true)
| | |-- dismissal_not_now: integer (nullable = true)
| | |-- open_submenu: integer (nullable = true)
| | |-- learn_more: integer (nullable = true)
| | |-- reopen_offered: integer (nullable = true)
| | |-- reopen_action_1: integer (nullable = true)
| | |-- reopen_action_2: integer (nullable = true)
| | |-- reopen_action_3: integer (nullable = true)
| | |-- reopen_action_last: integer (nullable = true)
| | |-- reopen_dismissal_click_elsewhere: integer (nullable = true)
| | |-- reopen_dismissal_leave_page: integer (nullable = true)
| | |-- reopen_dismissal_close_button: integer (nullable = true)
| | |-- reopen_dismissal_not_now: integer (nullable = true)
| | |-- reopen_open_submenu: integer (nullable = true)
| | |-- reopen_learn_more: integer (nullable = true)
|-- search_counts: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- engine: string (nullable = true)
| | |-- source: string (nullable = true)
| | |-- count: long (nullable = true)
|-- active_addons: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- addon_id: string (nullable = false)
| | |-- blocklisted: boolean (nullable = true)
| | |-- name: string (nullable = true)
| | |-- user_disabled: boolean (nullable = true)
| | |-- app_disabled: boolean (nullable = true)
| | |-- version: string (nullable = true)
| | |-- scope: integer (nullable = true)
| | |-- type: string (nullable = true)
| | |-- foreign_install: boolean (nullable = true)
| | |-- has_binary_components: boolean (nullable = true)
| | |-- install_day: integer (nullable = true)
| | |-- update_day: integer (nullable = true)
| | |-- signed_state: integer (nullable = true)
| | |-- is_system: boolean (nullable = true)
| | |-- is_web_extension: boolean (nullable = true)
| | |-- multiprocess_compatible: boolean (nullable = true)
|-- disabled_addons_ids: array (nullable = true)
| |-- element: string (containsNull = false)
|-- active_theme: struct (nullable = true)
| |-- addon_id: string (nullable = false)
| |-- blocklisted: boolean (nullable = true)
| |-- name: string (nullable = true)
| |-- user_disabled: boolean (nullable = true)
| |-- app_disabled: boolean (nullable = true)
| |-- version: string (nullable = true)
| |-- scope: integer (nullable = true)
| |-- type: string (nullable = true)
| |-- foreign_install: boolean (nullable = true)
| |-- has_binary_components: boolean (nullable = true)
| |-- install_day: integer (nullable = true)
| |-- update_day: integer (nullable = true)
| |-- signed_state: integer (nullable = true)
| |-- is_system: boolean (nullable = true)
| |-- is_web_extension: boolean (nullable = true)
| |-- multiprocess_compatible: boolean (nullable = true)
|-- blocklist_enabled: boolean (nullable = true)
|-- addon_compatibility_check_enabled: boolean (nullable = true)
|-- telemetry_enabled: boolean (nullable = true)
|-- user_prefs: struct (nullable = true)
| |-- dom_ipc_process_count: integer (nullable = true)
| |-- extensions_allow_non_mpc_extensions: boolean (nullable = true)
|-- events: array (nullable = true)
| |-- element: struct (containsNull = false)
| | |-- timestamp: long (nullable = false)
| | |-- category: string (nullable = false)
| | |-- method: string (nullable = false)
| | |-- object: string (nullable = false)
| | |-- string_value: string (nullable = true)
| | |-- map_values: map (nullable = true)
| | | |-- key: string
| | | |-- value: string (valueContainsNull = true)
|-- ssl_handshake_result_success: integer (nullable = true)
|-- ssl_handshake_result_failure: integer (nullable = true)
|-- ssl_handshake_result: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = true)
|-- active_ticks: integer (nullable = true)
|-- main: integer (nullable = true)
|-- first_paint: integer (nullable = true)
|-- session_restored: integer (nullable = true)
|-- total_time: integer (nullable = true)
|-- plugins_notification_shown: integer (nullable = true)
|-- plugins_notification_user_action: struct (nullable = true)
| |-- allow_now: integer (nullable = true)
| |-- allow_always: integer (nullable = true)
| |-- block: integer (nullable = true)
|-- plugins_infobar_shown: integer (nullable = true)
|-- plugins_infobar_block: integer (nullable = true)
|-- plugins_infobar_allow: integer (nullable = true)
|-- plugins_infobar_dismissed: integer (nullable = true)
|-- experiments: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- search_cohort: string (nullable = true)
|-- gfx_compositor: string (nullable = true)
|-- quantum_ready: boolean (nullable = true)
|-- gc_max_pause_ms_main_above_150: long (nullable = true)
|-- gc_max_pause_ms_main_above_250: long (nullable = true)
|-- gc_max_pause_ms_main_above_2500: long (nullable = true)
|-- gc_max_pause_ms_content_above_150: long (nullable = true)
|-- gc_max_pause_ms_content_above_250: long (nullable = true)
|-- gc_max_pause_ms_content_above_2500: long (nullable = true)
|-- cycle_collector_max_pause_main_above_150: long (nullable = true)
|-- cycle_collector_max_pause_main_above_250: long (nullable = true)
|-- cycle_collector_max_pause_main_above_2500: long (nullable = true)
|-- cycle_collector_max_pause_content_above_150: long (nullable = true)
|-- cycle_collector_max_pause_content_above_250: long (nullable = true)
|-- cycle_collector_max_pause_content_above_2500: long (nullable = true)
|-- input_event_response_coalesced_ms_main_above_150: long (nullable = true)
|-- input_event_response_coalesced_ms_main_above_250: long (nullable = true)
|-- input_event_response_coalesced_ms_main_above_2500: long (nullable = true)
|-- input_event_response_coalesced_ms_content_above_150: long (nullable = true)
|-- input_event_response_coalesced_ms_content_above_250: long (nullable = true)
|-- input_event_response_coalesced_ms_content_above_2500: long (nullable = true)
|-- ghost_windows_main_above_1: long (nullable = true)
|-- ghost_windows_content_above_1: long (nullable = true)
|-- user_pref_dom_ipc_plugins_sandbox_level_flash: integer (nullable = true)
|-- user_pref_dom_ipc_processcount: integer (nullable = true)
|-- user_pref_extensions_allow_non_mpc_extensions: boolean (nullable = true)
|-- user_pref_extensions_legacy_enabled: boolean (nullable = true)
|-- user_pref_browser_search_widget_innavbar: boolean (nullable = true)
|-- user_pref_general_config_filename: string (nullable = true)
|-- ** dynamically included scalar fields, see source **
|-- ** dynamically included whitelisted histograms, see source **
|-- boolean_addon_scalars: map (nullable = true)
| |-- key: string
| |-- value: boolean (valueContainsNull = true)
|-- keyed_boolean_addon_scalars: map (nullable = true)
| |-- key: string
| |-- value: map (valueContainsNull = true)
| | |-- key: string
| | |-- value: boolean (valueContainsNull = true)
|-- string_addon_scalars: map (nullable = true)
| |-- key: string
| |-- value: string (valueContainsNull = true)
|-- keyed_string_addon_scalars: map (nullable = true)
| |-- key: string
| |-- value: map (valueContainsNull = true)
| | |-- key: string
| | |-- value: string (valueContainsNull = true)
|-- uint_addon_scalars: map (nullable = true)
| |-- key: string
| |-- value: integer (valueContainsNull = true)
|-- keyed_uint_addon_scalars: map (nullable = true)
| |-- key: string
| |-- value: map (valueContainsNull = true)
| | |-- key: string
| | |-- value: integer (valueContainsNull = true)
|-- submission_date_s3: string (nullable = true)
|-- sample_id: string (nullable = true)
For more detail on where these fields come from in the
raw data,
please look
in the MainSummaryView code.
in the buildSchema function.
Most of the fields are simple scalar values, with a few notable exceptions:
- The
search_countfield is an array of structs, each item in the array representing a 3-tuple of (engine,source,count). Theenginefield represents the name of the search engine against which the searches were done. Thesourcefield represents the part of the Firefox UI that was used to perform the search. It contains values such asabouthome,urlbar, andsearchbar. Thecountfield contains the number of searches performed against this engine+source combination during that subsession. Any of the fields in the struct may be null (for example if the search key did not match the expected pattern, or if the count was non-numeric). - The
loop_activity_counterfield is a simple struct containing inner fields for each expected value of theLOOP_ACTIVITY_COUNTEREnumerated Histogram. Each inner field is a count for that histogram bucket. - The
popup_notification_statsfield is a map ofStringkeys to struct values, each field in the struct being a count for the expected values of thePOPUP_NOTIFICATION_STATSKeyed Enumerated Histogram. - The
places_bookmarks_countandplaces_pages_countfields contain the mean value of the corresponding Histogram, which can be interpreted as the average number of bookmarks or pages in a given subsession. - The
active_addonsfield contains an array of structs, one for each entry in theenvironment.addons.activeAddonssection of the payload. More detail in Bug 1290181. - The
disabled_addons_idsfield contains an array of strings, one for each entry in thepayload.addonDetailswhich is not already reported in theenvironment.addons.activeAddonssection of the payload. More detail in Bug 1390814. Please note that while using this field is generally OK, this was introduced to support the TAAR project and you should not count on it in the future. The field can stay in themain_summary, but we might need to slightly change the ping structure to something better thanpayload.addonDetails. - The
themefield contains a single struct in the same shape as the items in theactive_addonsarray. It contains information about the currently active browser theme. - The
user_prefsfield contains a struct with values for preferences of interest. - The
eventsfield contains an array of event structs. - Dynamically-included histogram fields are present as key->value maps, or key->(key->value) nested maps for keyed histograms.
Time formats
Columns in main_summary may use one of a handful of time formats with different precisions:
| Column Name | Origin | Description | Example | Spark | Presto |
|---|---|---|---|---|---|
timestamp | stamped at ingestion | nanoseconds since epoch | 1504689165972861952 | from_unixtime(timestamp/1e9) | from_unixtime(timestamp/1e9) |
submission_date_s3 | derived from timestamp | YYYYMMDD date string of timestamp | 20170906 | from_unixtime(unix_timestamp(submission_date, 'yyyyMMdd')) | date_parse(submission_date, '%Y%m%d') |
client_submission_date | derived from HTTP header: Fields[Date] | HTTP date header string sent with the ping | Tue, 27 Sep 2016 16:28:23 GMT | unix_timestamp(client_submission_date, 'EEE, dd M yyyy HH:mm:ss zzz') | date_parse(substr(client_submission_date, 1, 25), '%a, %d %b %Y %H:%i:%s') |
creation_date | creationDate | time of ping creation ISO8601 at UTC+0 | 2017-09-06T08:21:36.002Z | to_timestamp(creation_date, "yyyy-MM-dd'T'HH:mm:ss.SSSXXX") | from_iso8601_timestamp(creation_date) AT TIME ZONE 'GMT' |
timezone_offset | info.timezoneOffset | timezone offset in minutes | 120 | ||
subsession_start_date | info.subsessionStartDate | hourly precision, ISO8601 date in local time | 2017-09-06T00:00:00.0+02:00 | from_iso8601_timestamp(subsession_start_date) AT TIME ZONE 'GMT' | |
subsession_length | info.subsessionLength | subsession length in seconds | 599 | date_add('second', subsession_length, subsession_start_date) | |
profile_creation_date | environment.profile.creationDate | days since epoch | 15,755 | from_unixtime(profile_creation_date * 86400) |
Code Reference
This dataset is generated by telemetry-batch-view. Refer to this repository for information on how to run or augment the dataset.