Projects

Below are a number of trailheads that lead into the projects and code that comprise the Firefox Data Platform.

Telemetry APIs

Name and repo Description
python_moztelemetry Python APIs for Mozilla Telemetry
moztelemetry Scala APIs for Mozilla Telemetry
spark-hyperloglog Algebird's HyperLogLog support for Apache Spark

ETL code and Datasets

Name and repo Description
telemetry-batch-view Scala ETL code for derived datasets
python_mozetl Python ETL code for derived datasets
telemetry-airflow Airflow configuration and DAGs for scheduled jobs
python_mozaggregator Aggregation job for telemetry.mozilla.org aggregates
telemetry-streaming Spark Streaming ETL jobs for Mozilla Telemetry

See also firefox-data-docs for documentation on datasets.

Infrastructure

Name and repo Description
mozilla-pipeline-schemas JSON and Parquet Schemas for Mozilla Telemetry and other structured data
hindsight Real-time data processing
lua_sandbox Generic sandbox for safe data analysis
lua_sandbox_extensions Modules and packages that extend the Lua sandbox
nginx_moz_ingest Nginx module for Telemetry data ingestion
puppet-config Cloud services puppet config for deploying infrastructure
parquet2hive Hive import statement generator for Parquet datasets

EMR Bootstrap scripts

Name and repo Description
emr-bootstrap-spark AWS bootstrap scripts for Spark.
emr-bootstrap-presto AWS bootstrap scripts for Presto.

Data applications

Name and repo Description
telemetry.mozilla.org Main entry point for viewing aggregate Telemetry data
Cerberus & Medusa Automatic alert system for telemetry aggregates
analysis.t.m.o Self serve data analysis platform
Mission Control Low latency dashboard for stability and health metrics
Experiments Viewer Visualization for Shield experiment results
Re:dash Mozilla's fork of the data query / visualization system
TAAR Telemetry-aware addon recommender
Ensemble A minimalist platform for publishing data
Hardware Report Firefox Hardware Report, available here
python-zeppelin Convert Zeppelin notebooks to Markdown
St. Mocli A command-line interface to STMO
probe-scraper Scrape and publish Telemetry probe data from Firefox
test-tube Compare data across branches in experiments
experimenter A web application for managing experiments
St. Moab Automatically generate Re:dash dashboard for A/B experiments

Reference materials

Public

Name and repo Description
firefox-data-docs All the info you need to answer questions about Firefox users with data
Firefox source docs Mozilla Source Tree Docs - Telemetry section
reports.t.m.o Knowledge repository for public reports

Non-public

Name and repo Description
Fx-Data-Planning Quarterly goals and internal documentation