In one of the biggest password re-use studies of its kind, an analysis of more than one billion leaked credentials has discovered that one out of every 142 passwords is the classic “123456” string.
The study, carried out last month by computer engineering student Ata Hakçıl, analyzed username and password combinations that leaked online after data breaches at various companies.
These “data dumps” have been around for more than half a decade, and have been piling up as new companies are getting hacked.
The data dumps are easily available online, on sites like GitHub or GitLab, or freely distributed via hacking forums and file-sharing portals.
Over the years, tech companies have been collecting these data dumps. For example, Google, Microsoft, and Apple, have collected leaked credentials to create in-house alert systems that warn users when they’re utilizing a “weak” or “common” password.
Furthermore, the Have I Been Pwned online service also works on top of these leaked data dumps and credentials.
Study results
Last month, Hakçıl, a Turkish student studying at a university in Cyprus, downloaded and analyzed more than one billion leaked credentials.
The main discovery was that the 1,000,000,000+ credentials dataset included only 168,919,919 unique passwords, of which more than 7 million were the “123456” string.
This means that one out of every 142 passwords included in the sample Hakçıl analyzed was the weakest password known today — with the “123456” string being the most commonly reused password online for the past five years in a row, and counting.
In addition, Hakçıl also discovered that the average password length is usually of 9.48 characters, which isn’t good, but isn’t terrible either, as most security experts recommend using passwords as long as possible, and usually in the realm of 16 to 24 characters, or more.
But password length was not the only issue Hakçıl discovered. The Turkish researcher said that password complexity was also a problem, with only 12% of the passwords containing a special character.
In most cases, users chose simplistic passwords such as using only letters (29%) or numbers (13%). This meant that around 42% of all the passwords included in the 1 billion dataset were vulnerable to quick dictionary attacks that would allow threat actors to gain access to accounts without any effort or technical difficulty.
The study’s full results are available on GitHub, with a short summary below:
- From 1.000.000.000+ lines of dumps, 257.669.588 were filtered as either corrupt data(gibberish in improper format) or test accounts.
- 1 Billion credentials boil down to 168.919.919 passwords, and 393.386.953 usernames.
- Most common password is 123456. It covers roughly 0.722% of all the passwords. (Around 7 million times per billion)
- Most common 1000 passwords cover 6.607% of all the passwords.
- With most common 1 million passwords, hit-rate is at 36.28%, and with most common 10 million passwords hit rate is at 54.00%.
- Average password length is 9.4822 characters.
- 12.04% of passwords contain special characters.
- 28.79% of passwords are letters only.
- 26.16% of passwords are lowercase only.
- 13.37% of passwords are numbers only.
- 34.41% of all passwords end with digits, but only 4.522% of all passwords start with digits.