The semantics of online authentication in the web are rather straightforward: if Alice has a certificate binding Bob’s name to a public key, and if a remote entity can prove knowledge of Bob’s private key, then (barring key compromise) that remote entity must be Bob. However, in reality, many websites—and the majority of the most popular ones—are hosted at least in part by third parties such as Content Delivery Networks (CDNs) or web hosting providers. Put simply: administrators of websites who deal with (extremely) sensitive user data are giving their private keys to third parties. Importantly, this sharing of keys is undetectable by most users, and widely unknown even among researchers.
In this paper, we perform a large-scale measurement study of key sharing in today’s web. We analyze the prevalence with which websites trust third-party hosting providers with their secret keys, as well as the impact that this trust has on responsible key management practices, such as revocation. Our results reveal that key sharing is extremely common, with a small handful of hosting providers having keys from the majority of the most popular websites. We also find that hosting providers often manage their customers’ keys, and that they tend to react more slowly yet more thoroughly to compromised or potentially compromised keys.
In general, we are referring to the scenario where one party makes its certificate's private key available to another party. Since this is difficult to observe as an outsider, we restrict our definition of key sharing in terms of what we can observe:
We say that key sharing has taken place if any of the parties named in a certificate (either the Common Name or entries in the SAN list) are not the same entity as the organization who owns the IP address from which it is advertised.
The security of any public key encryption system rests on keeping private keys private; sharing private keys across entities violates these assumptions. A single website choosing to share its private key with a hosting provider may seem relatively innocuous, but large numbers of websites sharing with a small number of hosting providers may lead to even greater centralization of trust than was previously realized. Our results expose trust relationships in the HTTPS ecosystem, complementing a large body of work (see §7 in our paper) that has studied similar trust relationships between websites and CAs.
Name | Type | Size | Format | SHA-256 Hash (Compressed) | Labels |
---|---|---|---|---|---|
Leaf Certificates | gzipped tsv (tab-separated values) | 1.1 GB | README | Show dc025023fa1fde39c98ce928e84a5158c49949eeed3bc6c6ffb044ba71eb25f4 | CERTS RDNS |
IP to ASN | gzipped directory | 292 MB | README | Show 3e9931bc0e6b09efd32abc1e2e362bef3febd65622d87a368088a71e39133fb7 | ASN |
ASN to Organiztion | gzipped directory | 35 MB | README | Show 984ed95082cf4a8b3dd114f589e698bc04fbab9a55fdfd727371cc709ca69dd7 | ASN |
WHOIS Record Emails |
gzipped ssv (space-separated values) | 68 MB | README | Show 1efc7fbce69b7e01ac2f2b756e9c0ef3ba5fa0bc1dd49b341b2ec5bb3637c524 | WHOIS |
The first dataset groups all of the domains in our dataset into separated organizational entities. This is an important tool in our study because it allows us to identify cruise-liner certificates (as opposed to certificates with many domain names from a single organization, as with Google). Also, reporting on how many organizations share their keys avoids over-inflating numbers—-a single organization’s decision to use a third-party hosting provider could result in all of its domains’ keys being shared, and some organizations own hundreds of domains.
As outlined in the figure above, we first created a graph linking all domains from our certificate datset to the email addresses appearing in their WHOIS records, and then used the Louvain community detection algorithm to cluster these domains into groups of organizations. For more details, please see Section 4.1 in our paper.
The next dataset provides the ability to determine which third-party organizations host a given certificate. We first identify all possible hosting providers by looking up each IP address from our certificate dataset in our reverse DNS and ASN datasets. We then unify these for each certificate (e.g. a reverse DNS entry of softlayer.com and an AS Organization Name of Soft-Layer Technologies Inc. represent the same organization and thus should only be counted once). Finally, using the domain ownership methodology from the previous section, we conclude that a certificate is...
This dataset determines who manages each certificate: the organization(s) on the certificates or the hosting provider serving the certificate? Determining who is revoking or reissuing a certificate is nontrivial: revocations and reissues do not express who exactly requested them (after all, the PKI was designed on the premise that the entity listed on the certificate is the sole owner of the secret key).
Our insight is that hosting providers who manage their customers' certificates are responsible for obtaining many new certificates, and would therefore, out of convenience, likely gravitate towards a small set of certificate authorities when obtaining certificates. More specifically, we anticipate that when the population of users from a given provider (mostly) obtains their own certificates, this distribution will resemble the distribution of CAs across the entire population of certificates. On the other hand, when a hosting provider manages certificates on its customers' behalf, we anticipate the distribution will be skewed very heavily towards a small set of issuing certificates. For more details please see Section 6.1 in our paper.
Name | Type | Size | Format | SHA-256 Hash (Compressed) |
---|---|---|---|---|
Domain to Organization Mapping | ssv (space-separated values) | 35 MB | README | Show 926a2ffd5e57cbab32a2d891c31b040532f67848dcaff422e34095f92d1a6120 |
Third-Party Services Hosting Each Certificate | gzipped ssv (space-separated values) | 138 MB | README | Show 1f38a7eaef45ab3a8b32fb0d95e248e8769838e5252eda39a386977986e6e80e |
Management Policy of Third-Party Hosting Services | 2 gzipped tsv | 3.1 MB | README | Show eea41858f6c2b7636366acca93502794985b9f59ac690f9d92f53a0e238935fd |
If you have any questions, comments or concerns, or if you're interested in using our data in your research, please email Frank Cangialosi!