Socorro Crash Reports
Introduction
Public crash statistics for Firefox are available through the Data Platform in a socorro_crash
dataset.
The crash data in Socorro is sanitized and made available to ATMO and STMO.
A nightly import job converts batches of JSON documents into a columnar format using the associated JSON Schema.
Contents
Accessing the Data
The dataset is available in parquet at s3://telemetry-parquet/socorro_crash/v2
.
It is also indexed with Athena and Presto with the table name socorro_crash
.
Data Reference
Example
The dataset can be queried using SQL. For example, we can aggregate the number of crashes and total up-time by date and reason.
SELECT crash_date,
reason,
count(*) as n_crashes,
avg(uptime) as avg_uptime,
stddev(uptime) as stddev_uptime,
approx_percentile(uptime, ARRAY [0.25, 0.5, 0.75]) as qntl_uptime
FROM socorro_crash
WHERE crash_date='20180520'
GROUP BY 1,
2
Scheduling
The job is schedule on a nightly basis on airflow.
The dag is available under mozilla/telemetry-airflow:/dags/socorro_import.py
.
Schema
The source schema is available on the mozilla/socorro
GitHub repository.
This schema is transformed into a Spark-SQL structure and serialized to parquet after transforming column names from camelCase
to snake_case
.
Code Reference
The code is a notebook in the mozilla-services/data-pipeline
repository.