Search Data

Introduction

This article introduces the datasets we maintain for search analyses. After reading this article, you should understand the search datasets well enough to produce moderately complex analyses.

Table of Contents

Permissions

Access to both search_aggregates and search_clients_daily is heavily restricted in re:dash. We also maintain a restricted group for search on Github and Bugzilla. If you reach a 404 on Github or don't have access to a re:dash query or bug this is likely your issue. To get access permissions, file a bug using the search permissions template

Once you have proper permissions, you'll have access to a new source in re:dash called Presto Search. You will not be able to access any of the search datasets via the standard Presto data source, even with proper permissions.

Terminology

Direct vs Follow-on Search

Searches can be split into two major classes: direct and follow-on.

Direct searches result from a direct interaction with a search access point (SAP), which is part of the Firefox UI. These searches are often called SAP searches. There are currently 6 SAPs:

  • urlbar - entering a search query in the Awesomebar
  • searchbar - the main search bar; not present by default for new profiles on Firefox 57+
  • newtab - the search bar on the about:newtab page
  • abouthome - the search bar on the about:home page
  • contextmenu - selecting text and clicking "Search" from the context menu
  • system - starting Firefox from the command line with an option that immediately makes a search

Users will often interact with the Search Engine Results Page (SERP) to create "downstream" queries. These queries are called follow-on queries. These are sometimes also referred to as in-content queries since they are initiated from the content of the page itself and not from the Firefox UI.

For example, follow-on queries can be caused by:

  • Revising a query (restaurants becomes restaurants near me)
  • Clicking on the "next" button
  • Accepting spelling suggestions

Tagged vs Untagged Searches

Our partners (search engines) attribute queries to Mozilla using partner codes. When a user issues a query through one of our SAPs, we include our partner code in the URL of the resulting search.

Tagged queries are queries that include one of our partner codes.

Untagged queries are queries that do not include one of our partner codes. If a query is untagged, it's usually because we do not have a partner deal for that search engine and region.

If an SAP query is tagged, any follow-on query should also be tagged.

Standard Search Aggregates

We report three types of searches in our search datasets: SAP, tagged-sap, and tagged-follow-on. These aggregates show up as columns in the search_aggregates and search_clients_daily datasets. Our search datasets are all derived from main_summary. The aggregate columns are derived from the SEARCH_COUNTS histogram.

The SAP column counts all SAP (or direct) searches. SAP search counts are collected via probes within the Firefox UI These counts are very reliable, but do not count follow-on queries.

In 2017-06 we deployed the [followonsearch addon], which adds probes for tagged-sap and tagged-follow-on searches. These columns attempt to count all tagged searches by looking for Mozilla partner codes in the URL of requests to partner search engines. These search counts are critical to understanding revenue since they exclude untagged searches and include follow-on searches. However, these search counts have important caveats affecting their reliability. See In Content Telemetry Issues for more information.

In main_summary, all of these searches are stored in search_counts.count, which makes it easy to over count searches. Avoid using main_summary for search analyses.

Outlier Filtering

We remove search count observations representing more than 10,000 searches for a single search engine in a single ping.

In Content Telemetry Issues

The [followonsearch addon] implements the probe used to measure tagged-sap and tagged-follow-on searches. This probe is critical to understanding our revenue. It's the only tool that gives us a view of follow-on searches and differentiates between tagged and untagged queries. However, it comes with some notable caveats.

Relies on whitelists

The [followonsearch addon] attempts to count all tagged searches by looking for Mozilla partner codes in the URL of requests to partner search engines. To do this, the addon relies on a whitelist of partner codes and URL formats. The list of partner codes is incomplete and only covers a few top partners. These codes also occasionally change so there will be gaps in the data.

Additionally, changes to search engine URL formats can cause problems with our data collection. See this query for a notable example.

Addon uptake

This probe is shipped as an addon. Versions 55 and greater have the addon installed by default (Bug). The addon was deployed to older versions of Firefox via GoFaster, but uptake is not 100%.

Limited historical data

The addon was first deployed in 2017-06. There is no tagged-* search data available before this.