A bitwise analysis of AWS access key identifiers

Pallavi Sivakumaran

Summary

AWS access keys are credentials used within the AWS ecosystem to enable programmatic access to AWS resources. There are two parts to an access key: the access key ID and the secret access key [5]. Access key IDs have been the focus of research over the past few years, and researchers have found that it's possible to glean the AWS account number from an access key ID associated with the account. Whether additional - possibly security-relevant - information can be derived from access key IDs is a subject of interest.

We performed a bitwise analysis over a large dataset of 3000+ AWS access key IDs and discovered the following:

Bit 20 of an access key ID indicates the “generation” of access key: 0 is old generation, 1 is the new generation.
Bits 21-60 are the binary representation of the AWS account number in new generation access key IDs only. We don't yet know of a way to get the AWS account number from an old generation access key ID.
Bit 92 has a 3:1 bias towards a value of 1. None of the other bit positions (disregarding 0-19, which indicate the identifier type) exhibit such a bias. This observation does not hold true for older generation access key IDs. The reason for this bias is as yet unknown.

This blog post documents our approach, details of the above observations, as well as open questions from our analysis into the access key ID component of an AWS access key.

1. Access key analysis

1.1 What are access keys?

AWS uses access keys as long-term credentials for IAM or root users [5]. The access key ID component of an AWS access key is akin to a username and the secret access key is similar to a password. There are also temporary access keys, which contain a token in addition to the ID and secret.

As its name suggests, the secret access key component contains sensitive information and is expected to be stored securely, while the access key ID isn't widely regarded as being sensitive. However, in the same way that usernames might reveal some information about users or the environment they're used in, we believe access key IDs also have the potential to reveal interesting information about the cloud estate they're used in.

1.2 What does the research community already know about access key IDs?

There’s been a fair amount of research into the structure and format of AWS access keys (the ID component, in particular) by the cloud security/research community [1][2][3][4]. Some of the key findings as of September 2024 are:

Access key IDs are base32-encoded.
The AWS account ID that an access key is associated with can be obtained from 5 bytes of the access key ID [2][3].
There are two “generations” of access keys: pre-2019 and post-2019 (approximately). With the older generation, the keys all ended in A or Q, and the 5th letter was always I or J (based on observations from a very small dataset) [1].

1.3 Initial hypothesis and testing: Extracting AWS account number from access key ID

Note: This hypothesis was made and testing was conducted over a year ago. Some of the results are now known from other researchers’ works as well.

Hypothesis: Our analysis of AWS access key IDs began when we considered the fact that AWS credentials – as stored locally for an IAM user – consist of only the access key ID and secret access key. We considered it highly unlikely that AWS stores and queries all access keys for all IAM users in all AWS accounts within a single DB table, since that would be very inefficient. Therefore, we assumed it had to be possible to obtain the AWS account number from the access key ID.

Testing: To test this, we generated access keys for different users within a test AWS account and compared them with access keys we had gathered from other accounts. We observed that characters 4-11 (0-indexed) were always the same for access key IDs of a single account. We converted the access key ID to its binary equivalent and found that the account number was in fact present in full as bits 21-60. That is, if you extract bits 21-60 from the binary representation of an access key ID and convert them to decimal, you will get the AWS account number.

We found that this is true for some other types of AWS IAM identifiers as well, e.g., role identifiers.

2. The older generation of access key IDs

The above finding didn’t hold true for any of the older generation of access key IDs listed in Scott Piper’s blog from 2018 [1].

Differentiating between access key generations: If there are two generations of access key IDs, then there must be some way of differentiating between them. We know that bits 0-19 are used for denoting the identifier type [5], and our research showed that bits 21-60 contained the account number (in the newer generation of access keys). We therefore focused on bit 20; we found that it had a value of 1 for all new access keys and 0 for older ones. We conclude that this bit is used for identifying the generation (though, we note here that we have very few older-generation access keys to verify this theory against).

In fact, with the older generation access key IDs, in addition to bit 20 always being 0, bits 21-23 were always 100 and the last four bits were always 0000. The values of bits 20-23 explain other researchers’ observations that the 5^th letter was always I or J, since I=01000 and J=01001 in base32.

Testing the older generation of access keys: Using the STS:GetAccessKeyInfo API call against old generation access key IDs returns valid account numbers. However, unlike with newer access key IDs, modifying a single character within the access key ID does not return another valid account number. This implies that the relationship between account number and access key ID isn’t as straightforward as it is in the new generation.

Open question: For the older generation of access keys, is the account number derived from the access key ID, or is it possible that it is obtained by querying against an internal database?

3. What other information can be obtained from an access key ID?

Once the meaning of bits 0-60 of an access key ID (new generation) had been identified, we wanted to know whether additional information could be obtained from the remaining bits. In particular, we wanted to know whether an access key ID could be mapped to the IAM user it was associated with. However, we have not yet found any obvious patterns that may indicate that this is the case. Because an AWS account can have a maximum of 5,000 IAM users and each user can have a maximum of 2 access keys, there is an upper limit of 10,000 access keys per account. It is therefore possible that the access key IDs are simply stored in a database and queried against.

One potentially interesting observation we made when analysing a dataset of 3,000+ access key IDs (from different AWS accounts) was that bit 92 (0-indexed) is more likely to be 1 than 0. The ratio of 0 to 1 is approximately 1:3. There appears to be no correlation between this bit field and the region/user/account/timestamp of the access key ID, therefore the reason for this bias is unclear. This observation holds true for temporary access keys as well, but not for the older generation of acccess key IDs.

Open question: Could the bias at bit position 92 be the result of a logical OR operation? (OR is the only bitwise operation with a bias of 3:1 towards value 1). Otherwise, what is the reason for this bias?

4. Other random observations

The STS:GetAccessKeyInfo API call doesn’t seem to check that all characters in the AccessKeyId field are base32. It throws an error for certain characters. E.g., trying Value AKIA7@@@@@@@@@@@@@@@@@@@@@@@ results in failed to satisfy constraint: Member must satisfy regular expression pattern: [\w]*. This implies that any word character, including _, would be accepted. Passing it a value of AKIA7_____________ returns an account number of 1030792151040. What seems to be happening is that it just ignores any invalid character (i.e., lowercase or underscore or 0/1/8) and calculates positionally for valid ones.
Despite AWS’ claim that an access key ID can be upto 128 characters long, passing any AccessKeyId over 25 characters in length to STS:GetAccessKeyInfo results in a ValidationError. This is different to when we pass in an AccessKeyId that’s too short. In that scenario, the error we get is Parameter validation failed: Invalid length for parameter AccessKeyId. This may provide clues as to what comprises a valid access key.

References

1. AWS Security Credential Formats, https://summitroute.com/blog/2018/06/20/aws_security_credential_formats

2. AWS Access Key ID formats, https://awsteele.com/blog/2020/09/26/aws-access-key-format.html

3. A short note on AWS KEY ID, https://medium.com/@TalBeerySec/a-short-note-on-aws-key-id-f88cc4317489

4. Discussion within the Cloud Security Forum Slack channel (requires login and access), https://cloudsecurityforum.slack.com/archives/C6DN616HG/p1563994158050800

5. IAM identifiers: Unique identifiers, https://docs.aws.amazon.com/IAM/latest/UserGuide/reference_identifiers.html#identifiers-unique-ids

6. AccessKey, https://docs.aws.amazon.com/IAM/latest/APIReference/API_AccessKey.html