The Bike Shed

 

LinkedIn Password Leak: Salt Their Hide

If it does not take a full second to calculate the password hash, it is too weak.


Poul-Henning Kamp


6.5 million unsalted SHA1 hashed LinkedIn passwords have appeared in the criminal underground. There are two words in that sentence that should cause LinkedIn no end of concern: "unsalted" and "SHA1."

Let me walk through the process of password protection and explain why unsalted passwords are only infinitesimally better than plaintext passwords:

User Alice selects and enters a new password for use with some kind of computer service: "LIMaabotM." The computer needs to be able to check if Alice types the right password the next time she logs in, and so it stores the password in a file or database. This is obviously bad, because anybody with read-access, legitimate or illegitimate, can read Alice's password and login as Alice. So what smart people figured out in the 1970s was that you should store a one-way hash of the password instead.

When somebody tries to log in as Alice, the password that person enters is run through the same one-way hash, and if the output matches what is in the database, that person knew the right password. This works great, and now Evil Sysadmin Bob and Industrial Spy Charles can no longer read Alice's password. But they can still try to guess it. If they run all possible passwords they can imagine through the same one-way function, and one of the outputs is the same as the value the database has for Alice, they have found something that will work as a password for Alice's account.

An important detail here is that they can do that on any computer or computer-cluster they have access to without anybody noticing and that is apparently why the password file was generated: the criminals needed help guessing the passwords.

This is the first place LinkedIn failed utterly: Calculating the SHA1 function is very, very fast. A regular computer can crunch from 10 million to 100 million of them per second using the GPU, making it a trivial task to check even very large lists of potential passwords. The obvious way to make this harder is to make the calculation take more time, so instead of:

scrambled_password = HASH(password)

we could make it:

scrambled_password = HASH(password)
       for (i = 0; i < 1000; i++)
scrambled_password = HASH(scrambled_password)

In all probability, 1,000 is far too small a value and should be replaced with some suitably large number so that the loop takes a full second on a fast computer. A second is a minor detail for the user logging in, but slows the attackers' brute-force attack by a factor of 10,000,000 or more.

But we're not done.

On a system with many users, the chances that some of them have chosen the same password are pretty good. Humans are notoriously lousy at selecting good passwords. For the evil attacker, that means all users who have the same hashed password in the database have chosen the same password, so it is probably not a very good one, and the attacker can target that with a brute force attempt.

This became a problem and was solved in the 1980s: To fix the problem, we have to make sure that users do not have the same hash-value stored, even if they have the same password. And that is what a salt does: When a user changes her password, the system picks a random number or string X, and calculates

scrambled_password = SLOW_HASH(X + password)

and stores X along with the scrambled password in the database.

Even if all users choose "LIMaabotM" as their password, they all get a different X, and, therefore, different scrambled passwords, and the attacker will only discover this by the time he bruteforces the second or third password, and starts to detect a pattern.

LinkedIn didn't do this either.

The 6.5 million hashed passwords in all likelihood represent far more users than that, because everybody who chose "qwer5678" as a password shares a single entry in that file. I wouldn't be surprised if 6.5 million was the number of unique passwords all LinkedIn users have chosen, as there is, after all, only so much imagination to go around.

The fact that the leaked file does not have usernames attached is no consolation, it just means that whoever leaked it knows the value of what they have, and wants to keep the haul for themselves.

But we are still not done:

By the 1990s, people had started to experiment with putting computing into programmable chips, which means that a password protected with the standard UNIX crypt(3) function could be bruteforced in far too short time. The fix I applied back then, was to switch to a much more complex and hardware-unfriendly hashing function based on MD5. It got the name "md5crypt," and is probably one of the most widely used password scramblers today, which just goes to show that back then I totally failed to communicate my central insight. So let me again:

You can change your SLOW_HASH() function any time you want, provided you also store which function was used in addition to the salt and the scrambled password. This means that as computers get faster, you can increase the complexity of the function to slow it down, or if somebody finds an analytical attack that weakens your password scrambler, you can switch to a stronger one. The next time your users change their passwords, the extra strength kicks in. This property of password protection is what will allow LinkedIn to upgrade their password storage security now, without forcing all users to change their passwords at the same instant. As I said, md5crypt is pretty much the "default" password scrambler for a lot of people, but even though it fulfilled all relevant criteria back in 1995, I no longer consider it safe enough (see: http://phk.freebsd.dk/sagas/md5crypt_eol.html).

Recent research has managed to get a GPU processor very close to 1,000,000 md5crypt operations per second (see: http://2012.sharcs.org/slides/sprengers.pdf) and that means that any 8-character password can be bruteforced in approximately two days.

Conclusion

LinkedIn is learning fast right now, according to their damage control missives, they have now implemented salting and "better hashing." But we have yet to find out why nobody objected to them protecting 150+ million user passwords with 1970s methods.

And everybody else should take notice too: Even if you use md5crypt, you should upgrade your password scrambling algorithm. As a rule of thumb: If it does not take a full second to calculate the password hash, it is too weak.

LOVE IT, HATE IT? LET US KNOW

[email protected]

Poul-Henning Kamp ([email protected]) has programmed computers for 26 years and is the inspiration behind bikeshed.org. His software has been widely adopted as "under-the-hood" building blocks in both open source and commercial products. His most recent project is the Varnish HTTP accelerator, which is used to speed up large Web sites such as Facebook.

© 2012 ACM 1542-7730/11/0600 $10.00

acmqueue

Originally published in Queue vol. 10, no. 6
Comment on this article in the ACM Digital Library





More related articles:

Gobikrishna Dhanuskodi, Sudeshna Guha, Vidhya Krishnan, Aruna Manjunatha, Michael O'Connor, Rob Nertney, Phil Rogers - Creating the First Confidential GPUs
Today's datacenter GPU has a long and storied 3D graphics heritage. In the 1990s, graphics chips for PCs and consoles had fixed pipelines for geometry, rasterization, and pixels using integer and fixed-point arithmetic. In 1999, NVIDIA invented the modern GPU, which put a set of programmable cores at the heart of the chip, enabling rich 3D scene generation with great efficiency.


Antoine Delignat-Lavaud, Cédric Fournet, Kapil Vaswani, Sylvan Clebsch, Maik Riechert, Manuel Costa, Mark Russinovich - Why Should I Trust Your Code?
For Confidential Computing to become ubiquitous in the cloud, in the same way that HTTPS became the default for networking, a different, more flexible approach is needed. Although there is no guarantee that every malicious code behavior will be caught upfront, precise auditability can be guaranteed: Anyone who suspects that trust has been broken by a confidential service should be able to audit any part of its attested code base, including all updates, dependencies, policies, and tools. To achieve this, we propose an architecture to track code provenance and to hold code providers accountable. At its core, a new Code Transparency Service (CTS) maintains a public, append-only ledger that records all code deployed for confidential services.


David Kaplan - Hardware VM Isolation in the Cloud
Confidential computing is a security model that fits well with the public cloud. It enables customers to rent VMs while enjoying hardware-based isolation that ensures that a cloud provider cannot purposefully or accidentally see or corrupt their data. SEV-SNP was the first commercially available x86 technology to offer VM isolation for the cloud and is deployed in Microsoft Azure, AWS, and Google Cloud. As confidential computing technologies such as SEV-SNP develop, confidential computing is likely to simply become the default trust model for the cloud.


Mark Russinovich - Confidential Computing: Elevating Cloud Security and Privacy
Confidential Computing (CC) fundamentally improves our security posture by drastically reducing the attack surface of systems. While traditional systems encrypt data at rest and in transit, CC extends this protection to data in use. It provides a novel, clearly defined security boundary, isolating sensitive data within trusted execution environments during computation. This means services can be designed that segment data based on least-privilege access principles, while all other code in the system sees only encrypted data. Crucially, the isolation is rooted in novel hardware primitives, effectively rendering even the cloud-hosting infrastructure and its administrators incapable of accessing the data.





© ACM, Inc. All Rights Reserved.