Working with Crash Pings

Here are some snippets to get you started querying crash pings from the Dataset API.

We can first load and instantiate a Dataset object to query the crash pings, and look at the possible fields to filter on:

from moztelemetry.dataset import Dataset
telem = Dataset.from_source("telemetry")
telem.schema
# => 'submissionDate, sourceName, sourceVersion, docType, appName, appUpdateChannel,
#     appVersion, appBuildId'

The more specific these filters, the faster it can be pulled. The fields can be filtered by either value or a callable. For example, a version and date range can be specified from the v5758 and dateslambdas below:

v5758 = lambda x: x[:2] in ('57', '58')
dates = lambda x: '20180126' <= x <= '20180202'
telem = (
    Dataset.from_source("telemetry")
    .where(docType='crash', appName="Firefox", appUpdateChannel="release",
           appVersion=v5758, submissionDate=dates)
)

Now, referencing the docs for the crash ping, the desired fields can be selected and brought in as a spark RDD named pings

sel = (
    telem.select(
        os_name='environment.system.os.name',
        os_version='environment.system.os.version',
        app_version='application.version',
        app_architecture='application.architecture',
        clientId='clientId',
        creationDate='creationDate',
        submissionDate='meta.submissionDate',
        sample_id='meta.sampleId',
        modules='payload.stackTraces.modules',
        stackTraces='payload.stackTraces',
        oom_size='payload.metadata.OOMAllocationSize',
        AvailablePhysicalMemory='payload.metadata.AvailablePhysicalMemory',
        AvailableVirtualMemory='payload.metadata.AvailableVirtualMemory',
        TotalPhysicalMemory='payload.metadata.TotalPhysicalMemory',
        TotalVirtualMemory='payload.metadata.TotalVirtualMemory',
        reason='payload.metadata.MozCrashReason',
        payload='payload',
    )
)
pings = sel.records(sc)