About the Project

Goals

  1. Quantify the impact of third party scripts on the web.
  2. Identify the third party scripts on the web that have the greatest performance cost.
  3. Give developers the information they need to make informed decisions about which third parties to include on their sites.
  4. Incentivize responsible third party script behavior.
  5. Make this information accessible and useful.

Methodology

HTTP Archive is an inititiave that tracks how the web is built. Every month, ~4 million sites are crawled with Lighthouse on mobile. Lighthouse breaks down the total script execution time of each page and attributes the execution to a URL. Using BigQuery, this project aggregates the script execution to the origin-level and assigns each origin to the responsible entity.

FAQs

I don't see entity X in the list. What's up with that?

This can be for one of several reasons:

  1. The entity does not have references to their origin on at least 50 pages in the dataset.
  2. The entity's origins have not yet been identified. See How can I contribute?

What is "Total Occurences"?

Total Occurrences is the number of pages on which the entity is included.

How is the "Average Impact" determined?

The HTTP Archive dataset includes Lighthouse reports for each URL on mobile. Lighthouse has an audit called "bootup-time" that summarizes the amount of time that each script spent on the main thread. The "Average Impact" for an entity is the total execution time of scripts whose domain matches one of the entity's domains divided by the total number of pages that included the entity.

Average Impact = Total Execution Time / Total Occurrences

How does Lighthouse determine the execution time of each script?

Lighthouse's bootup time audit attempts to attribute all toplevel main-thread tasks to a URL. A main thread task is attributed to the first script URL found in the stack. If you're interested in helping us improve this logic, see Contributing for details.

The data for entity X seems wrong. How can it be corrected?

Verify that the origins in data/entities.json5 are correct. Most issues will simply be the result of mislabelling of shared origins. If everything checks out, there is likely no further action and the data is valid. If you still believe there's errors, file an issue to discuss futher.

How can I contribute?

Only about 90% of the third party script execution has been assigned to an entity. We could use your help identifying the rest! See Contributing for details.

Get back to the data