HTTP Edge Server Specification
This document specifies the behavior of the server that accepts submissions from any HTTP client e.g. Firefox telemetry.
The original implementation of the HTTP Edge Server was tracked in Bug 1129222.
General Data Flow
HTTP submissions come in from the wild, hit a load balancer, then optionally an Nginx proxy, then the HTTP Edge Server described in this document. Data is accepted via a POST/PUT request from clients, which the server will wrap in a Heka message and forward to two places: the Services Data Pipeline, where any further processing, analysis, and storage will be handled; as well as to a short-lived S3 bucket which will act as a fail-safe in case there is a processing error and/or data loss within the main Data Pipeline.
Namespaces
Namespaces are used to control the processing of data from different types of clients, from the metadata that is collected to the destinations where the data is written, processed and accessible. Namespaces are configured in Nginx using a location directive, to request a new namespace file a bug against the Data Platform Team with a short description of what the namespace will be used for and the desired configuration options. Data sent to a namespace that is not specifically configured is assumed to be in the non-Telemetry JSON format described here.
Forwarding to the pipeline
The constructed Heka protobuf message to is written to disk and the pub/sub pipeline (currently Kafka). The messages written to disk serve as a fail-safe, they are batched and written to S3 (landfill) when they reach a certain size or timeout.
Edge Server Heka Message Schema
- required binary
Uuid
; // Internal identifier randomly generated - required int64
Timestamp
; // Submission time (server clock) - required string
Hostname
; // Hostname of the edge server e.g.ip-172-31-2-68
- required string
Type
; // Kafka topic name e.g.telemetry-raw
- required group
Fields
- required string
uri
; // Submission URI e.g./submit/telemetry/6c49ec73-4350-45a0-9c8a-6c8f5aded0cf/main/Firefox/58.0.2/release/20180206200532
- required binary
content
; // POST Body - required string
protocol
; // e.g.HTTP/1.1
- optional string
args
; // Query parameters e.g.v=4
- optional string
remote_addr
; // In our setup it is usually a load balancer e.g.172.31.32.229
- // HTTP Headers specified in the production edge server configuration
- optional string
Content-Length
; // e.g.4722
- optional string
Date
; // e.g.Mon, 12 Mar 2018 00:02:18 GMT
- optional string
DNT
; // e.g.1
- optional string
Host
; // e.g.incoming.telemetry.mozilla.org
- optional string
User-Agent
; // e.g.pingsender/1.0
- optional string
X-Forwarded-For
; // Last entry is treated as the client IP for geoIP lookup e.g.10.98.132.74, 103.3.237.12
- optional string
X-PingSender-Version
;// e.g.1.0
- required string
Server Request/Response
GET Request
Accept GET on /status
, returning OK
if all is well. This can be used to
check the health of web servers.
GET Response codes
- 200 - OK.
/status
and all’s well - 404 - Any GET other than
/status
- 500 - All is not well
POST/PUT Request
Treat POST and PUT the same. Accept POST or PUT to URLs of the form
^/submit/namespace/[id[/dimensions]]$
Example Telemetry format:
/submit/telemetry/docId/docType/appName/appVersion/appUpdateChannel/appBuildID
Specific Telemetry example:
/submit/telemetry/ce39b608-f595-4c69-b6a6-f7a436604648/main/Firefox/61.0a1/nightly/20180328030202
Example non-Telemetry format:
/submit/namespace/doctype/docversion/docid
Specific non-Telemetry example:
/submit/eng-workflow/hgpush/1/2c3a0767-d84a-4d02-8a92-fa54a3376049
Note that id
above is a unique document ID, which is used for de-duping
submissions. This is not intended to be the clientId
field from Telemetry.
If id
is omitted, we will not be able to de-dupe based on submission URLs. It
is recommended that id
be a UUID.
POST/PUT Response codes
- 200 - OK. Request accepted into the pipeline.
- 400 - Bad request, for example an un-encoded space in the URL.
- 404 - not found - POST/PUT to an unknown namespace
- 405 - wrong request type (anything other than POST/PUT)
- 411 - missing content-length header
- 413 - request body too large (Note that if we have badly-behaved clients that retry on
4XX
, we should send back 202 on body/path too long). - 414 - request path too long (See above)
- 500 - internal error
Other Considerations
Compression
It is not desirable to do decompression on the edge node. We want to pass along messages from the HTTP Edge node without "cracking the egg" of the payload.
We may also receive badly formed payloads, and we will want to track the incidence of such things within the main pipeline.
Bad Messages
Since the actual message is not examined by the edge server the only failures that occur are defined by the response status codes above. Messages are only forwarded to the pipeline when a response code of 200 is returned to the client.
GeoIP Lookups
No GeoIP lookup is performed by the edge server. If a client IP is available the the data warehouse loader performs the lookup and then discards the IP before the message hits long-term storage.
Data Retention
The edge server only stores data while batching and will have a retention time
of moz_ingest_landfill_roll_timeout
which is generally only a few minutes.
Retention time for the S3 landfill, pub/sub, and the data warehouse is outside
the scope of this document.