Socorro Crash Reports

Introduction

Public crash statistics for Firefox are available through the Data Platform in a socorro_crash dataset. The crash data in Socorro is sanitized and made available to ATMO and STMO. A nightly import job converts batches of JSON documents into a columnar format using the associated JSON Schema.

Contents

Accessing the Data

The dataset is available in parquet at s3://telemetry-parquet/socorro_crash/v2. It is also indexed with Athena and Presto with the table name socorro_crash.

Data Reference

Example

The dataset can be queried using SQL. For example, we can aggregate the number of crashes and total up-time by date and reason.

SELECT crash_date,
       reason,
       count(*) as n_crashes,
       avg(uptime) as avg_uptime,
       stddev(uptime) as stddev_uptime,
       approx_percentile(uptime, ARRAY [0.25, 0.5, 0.75]) as qntl_uptime
FROM socorro_crash
WHERE crash_date='20180520'
GROUP BY 1,
         2

STMO Source

Scheduling

The job is schedule on a nightly basis on airflow. The dag is available under mozilla/telemetry-airflow:/dags/socorro_import.py.

Schema

The source schema is available on the mozilla/socorro GitHub repository. This schema is transformed into a Spark-SQL structure and serialized to parquet after transforming column names from camelCase to snake_case.

Code Reference

The code is a notebook in the mozilla-services/data-pipeline repository.