Columns > The Bikeshed


      view issue

LinkedIn Password Leak: Salt Their Hide

by Poul-Henning Kamp | June 7, 2012

Topic: Security

  • View Comments
  • Print

LinkedIn Password Leak: Salt Their Hide

If it does not take a full second to calculate the password hash, it is too weak.


Poul-Henning Kamp


6.5 million unsalted SHA1 hashed LinkedIn passwords have appeared in the criminal underground. There are two words in that sentence that should cause LinkedIn no end of concern: "unsalted" and "SHA1."

Let me walk through the process of password protection and explain why unsalted passwords are only infinitesimally better than plaintext passwords:

User Alice selects and enters a new password for use with some kind of computer service: "LIMaabotM." The computer needs to be able to check if Alice types the right password the next time she logs in, and so it stores the password in a file or database. This is obviously bad, because anybody with read-access, legitimate or illegitimate, can read Alice's password and login as Alice. So what smart people figured out in the 1970s was that you should store a one-way hash of the password instead.

When somebody tries to log in as Alice, the password that person enters is run through the same one-way hash, and if the output matches what is in the database, that person knew the right password. This works great, and now Evil Sysadmin Bob and Industrial Spy Charles can no longer read Alice's password. But they can still try to guess it. If they run all possible passwords they can imagine through the same one-way function, and one of the outputs is the same as the value the database has for Alice, they have found something that will work as a password for Alice's account.

An important detail here is that they can do that on any computer or computer-cluster they have access to without anybody noticing and that is apparently why the password file was generated: the criminals needed help guessing the passwords.

This is the first place LinkedIn failed utterly: Calculating the SHA1 function is very, very fast. A regular computer can crunch from 10 million to 100 million of them per second using the GPU, making it a trivial task to check even very large lists of potential passwords. The obvious way to make this harder is to make the calculation take more time, so instead of:

scrambled_password = HASH(password)

we could make it:

scrambled_password = HASH(password)
       for (i = 0; i < 1000; i++)
scrambled_password = HASH(scrambled_password)

In all probability, 1,000 is far too small a value and should be replaced with some suitably large number so that the loop takes a full second on a fast computer. A second is a minor detail for the user logging in, but slows the attackers' brute-force attack by a factor of 10,000,000 or more.

But we're not done.

On a system with many users, the chances that some of them have chosen the same password are pretty good. Humans are notoriously lousy at selecting good passwords. For the evil attacker, that means all users who have the same hashed password in the database have chosen the same password, so it is probably not a very good one, and the attacker can target that with a brute force attempt.

This became a problem and was solved in the 1980s: To fix the problem, we have to make sure that users do not have the same hash-value stored, even if they have the same password. And that is what a salt does: When a user changes her password, the system picks a random number or string X, and calculates

scrambled_password = SLOW_HASH(X + password)

and stores X along with the scrambled password in the database.

Even if all users choose "LIMaabotM" as their password, they all get a different X, and, therefore, different scrambled passwords, and the attacker will only discover this by the time he bruteforces the second or third password, and starts to detect a pattern.

LinkedIn didn't do this either.

The 6.5 million hashed passwords in all likelihood represent far more users than that, because everybody who chose "qwer5678" as a password shares a single entry in that file. I wouldn't be surprised if 6.5 million was the number of unique passwords all LinkedIn users have chosen, as there is, after all, only so much imagination to go around.

The fact that the leaked file does not have usernames attached is no consolation, it just means that whoever leaked it knows the value of what they have, and wants to keep the haul for themselves.

But we are still not done:

By the 1990s, people had started to experiment with putting computing into programmable chips, which means that a password protected with the standard UNIX crypt(3) function could be bruteforced in far too short time. The fix I applied back then, was to switch to a much more complex and hardware-unfriendly hashing function based on MD5. It got the name "md5crypt," and is probably one of the most widely used password scramblers today, which just goes to show that back then I totally failed to communicate my central insight. So let me again:

You can change your SLOW_HASH() function any time you want, provided you also store which function was used in addition to the salt and the scrambled password. This means that as computers get faster, you can increase the complexity of the function to slow it down, or if somebody finds an analytical attack that weakens your password scrambler, you can switch to a stronger one. The next time your users change their passwords, the extra strength kicks in. This property of password protection is what will allow LinkedIn to upgrade their password storage security now, without forcing all users to change their passwords at the same instant. As I said, md5crypt is pretty much the "default" password scrambler for a lot of people, but even though it fulfilled all relevant criteria back in 1995, I no longer consider it safe enough (see: http://phk.freebsd.dk/sagas/md5crypt_eol.html).

Recent research has managed to get a GPU processor very close to 1,000,000 md5crypt operations per second (see: http://2012.sharcs.org/slides/sprengers.pdf) and that means that any 8-character password can be bruteforced in approximately two days.

Conclusion

LinkedIn is learning fast right now, according to their damage control missives, they have now implemented salting and "better hashing." But we have yet to find out why nobody objected to them protecting 150+ million user passwords with 1970s methods.

And everybody else should take notice too: Even if you use md5crypt, you should upgrade your password scrambling algorithm. As a rule of thumb: If it does not take a full second to calculate the password hash, it is too weak.

LOVE IT, HATE IT? LET US KNOW

feedback@queue.acm.org

Poul-Henning Kamp (phk@FreeBSD.org) has programmed computers for 26 years and is the inspiration behind bikeshed.org. His software has been widely adopted as "under-the-hood" building blocks in both open source and commercial products. His most recent project is the Varnish HTTP accelerator, which is used to speed up large Web sites such as Facebook.

© 2012 ACM 1542-7730/11/0600 $10.00

acmqueue

Originally published in Queue vol. 10, no. 6
see this item in the ACM Digital Library

Back to top

  • POUL-HENNING KAMP (phk@FreeBSD.org) is one of the primary developers of the FreeBSD operating system, which he has worked on from the very beginning. He is widely unknown for his MD5-based password scrambler, which protects the passwords on Cisco routers, Juniper routers, and Linux and BSD systems. Some people have noticed that he wrote a memory allocator, a device file system, and a disk encryption method that is actually usable. Kamp lives in Denmark with his wife, his son, his daughter, about a dozen FreeBSD computers, and one of the world's most precise NTP (Network Time Protocol) clocks. He makes a living as an independent contractor doing all sorts of stuff with computers and networks.

    For additional information see the ACM Digital Library Author Page for: Poul-Henning Kamp
     

Comments

  • Ken | Fri, 08 Jun 2012 08:06:08 UTC

    "You can change your SLOW_HASH() function any time you want, provided you also store which function was used in addition to the salt and the scrambled password. This means that as computers get faster, you can increase the complexity of the function to slow it down, or if somebody finds an analytical attack that weakens your password scrambler, you can switch to a stronger one. The next time your users change their passwords, the extra strength kicks in."
    
    Isn't the situation then even better than that?, i.e., you can upgrade a user's stored password the next type they *type* it, the next time they log in to your service.  That's probably a lot sooner than the next time users *change* their passwords!
  • PB | Fri, 08 Jun 2012 09:08:17 UTC

    Could it be changed even sooner than that by generating salted hashes of the stored hashes? Then when someone logs in you can simply try both?
  • Matt | Fri, 08 Jun 2012 09:13:43 UTC

    There's something more to the linkedin leak which I haven't seen an explanation for.
    
    I checked the file of password hashes for the hash of my password. It's there. It's marked as cracked (some of the hashes start with 00000, this seems to mean 'this one is cracked'). My password is strong, an 11 character string I generated from like this: dd if=/dev/random bs=8 count=1 | base64
    
    I can think of at least three explanations:
    
    1. My password isn't actually cracked, 00000 means something else.
    
    2. The person who did the cracking had access to more information than just a hash. Perhaps linkedin keep plaintext passwords for at least some accounts. Perhaps there's was a log somewhere with some plaintext passwords. Perhaps the hack was larger than just obtaining hashed passwords.
    
    3. It's easier to brute-force SHA-1 than I think.
  • Poul-Henning Kamp | Fri, 08 Jun 2012 09:20:55 UTC

    Good security would dictate that only the application where the user changes password have write access to the database, any application just checking the password should only need and have read-only access.
    
    I realize that in many web-based systems that distinction does not exist, and for that subset you are right.
    
  • Poul-Henning Kamp | Fri, 08 Jun 2012 09:27:52 UTC

    @Matt: I think you overestimate the strength of a password created that way.
    
    Because LinkedIn did not salt their passwords, the criminals could exploit the preexisting and precalculated 'rainbow' tables for SHA1, and I certainly rule out that they have reached a depth where such a password might be therein.
    
    One other thing people overlook when generating passwords the way you propose, is that it is perfectly possible that base64 spits out the danish word "Angstskrig" which falls to trivial dictionary attacks.
  • Rowan | Fri, 08 Jun 2012 11:09:24 UTC

    @Matt perhaps due to a SHA-1 collision means your password has the same hash as a much simpler password? Also by using base64 you're reducing the search space for a hacker as there are no symbol characters
  • Matt | Fri, 08 Jun 2012 11:38:49 UTC

    @Rowan: I'm reducing it compared to 11 random 8 bit characters, but I'm not reducing it compared to the 64 bits of (hopefully) random data I started with.
    
    Perhaps your underlying point (and Poul's) is that 64 bits is not enough, especially when there is no salt.
    
    How much is enough? I'm not an expert in the field, but taking the figure in the article above, i.e. "1,000,000 md5crypt operations per second ... means that any 8-character password can be bruteforced in approximately two days." and guessing that the calculation was
    
       26 ^ 8 = 208827064576                               // number of uppercase 8 letter passwords
       208827064576 / (1000000 * 3600 * 24) = 2.4 days
    
    then using the same reasoning for 64 random bits gets me
    
      2^64 / 1000000 = 600000 years
    
    Seems like it shouldn't have been crackable. And mere 'bad luck' seems far too unlikely. Where am I getting the numbers wrong?
  • Ralph Corderoy | Fri, 08 Jun 2012 11:39:33 UTC

    It would seem the password database could be upgraded instantly if the method is added too or already stored as it can record that an additional hashing step is being used to the existing hash in the DB, e.g. if md5crypt was used for all of them then md5crypt+scrypt would mean the next password entered by the user should be run through the old md5crypt and that result put through scrypt.  If it matches then the more direct scrypt version can be written back to the DB.
    
    scrypt is aimed at being stronger than bcrypt.  http://www.tarsnap.com/scrypt.html
  • Christian | Fri, 08 Jun 2012 13:32:53 UTC

    I believe that those of you asking "why not just accept both passwords" or thinking that you can "upgrade" the storage of the old passwords are mistaken for at least one obvious reason: the cat's already out of the bag; the unsalted, trivial hashes are already out there. If you transparently allow people to use either password or upgrade the hash without enforcing a password change, their account is still extremely vulnerable. An enforced password change is absolutely necessary to invalidate the compromised password data.
    
    On the other hand, the "upgrade password hashing at next login" could be perfectly viable if the service had not already been compromised.
  • Mike | Fri, 08 Jun 2012 14:25:54 UTC

    @Matt -
    
    I think one thing that you're missing is, as was mentioned- rainbow tables. These tables usually already exist for most hash functions (I'm sure they're available for SHA1). They took a long time to create initially, but that upfront work has already been done.
    
    Since they were unsalted, you can snag a rainbow table offline for SHA1 and go to town. There are many good articles talking about them, but he's the obligatory CH: http://www.codinghorror.com/blog/2007/09/rainbow-hash-cracking.html
  • Daniel Turner | Fri, 08 Jun 2012 15:17:45 UTC

    Not only are rainbow tables a very feasible attack on MD5 hashed passwords, it should also be noted that there are now ways of attacking an MD5 hashed password that does not rely on just simply generating the MD5 of evey possible password.
    
    Flame was recently found to have an entirely new attack against MD5 coded into it.
  • Aleksey Korzun | Fri, 08 Jun 2012 16:01:38 UTC

    What they basically did is salted hashes now and disabled all of the current passwords so the users are forced to update their 'encrypted' hashes with new salted hash.
    
    The thing that worries me, while that is great.. how did the attackers get the data in the first place? What else were they able to obtain? LinkedIn is not really disclosing any of that information at this point (as far as I'm aware).
  • anon | Fri, 08 Jun 2012 16:19:27 UTC

    @Matt: I read somewhere that the attackers had changed the login page...
  • Matt | Fri, 08 Jun 2012 16:24:21 UTC

    @Mike
    
    Poul commented above "I certainly rule out that they [rainbow tables] have reached a depth where such a password might be therein".
    
    I wonder whether maybe Poul dropped a "cannot", i.e. wrote the opposite of what he meant.
    
    The password I have is 11 characters long, lower, upper and a number. The rainbow tables I found appear to be limited to passwords up to 7 characters long, i.e. no use:
    
       http://freerainbowtables.mirror.garr.it/mirrors/freerainbowtables/sha1/
      
    but maybe there are bigger tables in circulation. How long does it take to crack the password I described, i.e. 11 characters, with such a table?
  • Poul-Henning Kamp | Fri, 08 Jun 2012 16:50:06 UTC

    @Daniel:  Look at md5crypt source code, none of current MD5 attacks are relevant.
    
    @Matt: yes I did: I will not rule out such rainbow tables.
    
    Remember criminals have access to 100.000+ machine botnets.  1GB each place ("WINDOWS_UPDATE.DAT") is a LOT of storage.
  • Barry | Fri, 08 Jun 2012 18:17:58 UTC

    Easy salting:
    Hash the user ID, hash the password, hash a third value known only to the application, exclusive-or the three, store that.
  • David F. Skoll | Fri, 08 Jun 2012 18:22:25 UTC

    I no longer have a LinkedIn account.  When I did, the associated email address was t99ef724coxc3omn@la.roaringpenguin.com (which was randomly generated).  My 16-character randomly-generated password was not in the list.
    
    Interestingly, on May 10th 2012, the email address t99ef724coxc3omn@la.roaringpenguin.com started receiving spam attempts, in spite of the fact that *nobody but LinkedIn* was supposed to know it.
    
    From this, I believe that LinkedIn was compromised at least as early as May 10th and that the attackers have login names as well as passwords.
    
  • David F. Skoll | Fri, 08 Jun 2012 18:25:18 UTC

    @Barry: Your "easy salting" is not that secure.  The User-ID is likely to be known, so its hash can be precalculated.  If the secret value leaks out, that can be precalculated too, so you're back to simply breaking the hash.
    
    For more secure salting, you want something like:
    
    hash(password | random_salt | secret_salt)
    
    where the pipe symbol denotes concatenation.  That way, the attacker can't easily precompute anything.
  • Jay | Fri, 08 Jun 2012 19:45:25 UTC

    A coder at one of my former employers salted the passwords but he did it after the hash was applied. I let him know repeatedly it wasn't a good idea. If an expert mentioned it in a public forum perhaps it might carry more weight than just my words...
  • decora | Fri, 08 Jun 2012 20:05:16 UTC

    Because management at Linkedin obviously didnt value security. They could easily have hired a person to try to analyze their own system and find faults in it. This is an extremely basic one. They would rather spend the money on whatever they spent it on (executive bonuses? advertising?) It went IPO in 2011.. how much of their income went to hedge funders and investment banks instead of into a kid programmer or two who could have found this flaw in a week and explained the fix? 
    
    
    
    
  • Bill | Fri, 08 Jun 2012 21:53:28 UTC

    The LinkedIn blog says, 'Finally, our current production database for account passwords is salted as well as hashed, which provides an additional layer of security.'.
    
    This sentence is odd.  The passwords themselves are not discussed, but their "production database", whatever that is, is salted now.
    
  • Basic Security | Fri, 08 Jun 2012 22:46:09 UTC

    Basic password security these days:
    
    When users select a password, check whatever the user submitted against a set of common password dictionaries - ideally, apply some rules to it (733t speak translation, add numbers or symbols to the beginning or end, capitalization games, etc.). This should help prevent P@$$w0rd (8 characters, upper case, lower case, numbers, symbols - it must be a great password!)
    
    Also note that attacking SHA1, a computer full of top end GPU's can crack in excess of 15 billion (15,000,000,000) single SHA1 hashes per second, for about $3,500.
  • Ross | Fri, 08 Jun 2012 23:01:53 UTC

    No matter how strong you make the hash, there is another fundamental problem: storing all of the hashes together in a single system.  An attacker who finds a single hole gets access to all the treasure.
    
    Nuclear missile silos required two keys turned by two people to fire the missile.  What if you could incorporate a two-party system into hash storage? Cut the hash in half, and store it in two different systems, physically separate, and designed and built by two independent teams using diverse techniques.  Thus an attacker would have to compromise two different systems to gain any usable data.  Yes, this would be a lot of added complexity and expense, but since we have reached the level of storing credentials numbering in hundreds of millions, perhaps it is justified.
  • Jon | Sat, 09 Jun 2012 00:50:08 UTC

    @Ross - My co-workers and I were discussing this. One of their salts was actually stored in a file on the production machine, and never entered into the database, nor into version control, to keep the system from being compromised if either was touched by malicious entities. 
    
    A particularly paranoid solution might be to store one solely in the DB, one solely in the VC, and one solely in production. Even within an organization, it is possible to do the 'Missile Launch Keys' thing and spread your salts wide. 
  • Andrew | Sat, 09 Jun 2012 01:15:42 UTC

    I worked for a healthcare technology company once where all the production servers and databases shared a *single* password which was kept in a text file in the code repository. Even though I pointed out how stupid this was to the CTO he remarked "we trust our employees" and nothing changed. Probably LinkedIn had people who were horrified at what was being done but the leadership didn't care.
  • RobD | Sat, 09 Jun 2012 01:29:50 UTC

    No salt was their achilles heal.  The salt also should be using the full character set.  A 16 char salt plus the password would make rainbow tables unusable except for a single user and brute force would take a long time and be worse if itterated.  Most people are saying the password file looks old.
    
    So itterate your salt+password.
    Use a full character set salt to extend the password.
    
  • Mike | Sat, 09 Jun 2012 07:38:16 UTC

    @Matt:
    
    In your calculation where you came up with 600,000 years, you used the ops/sec of md5crypt (1,000,000/sec)
    
    LinkedIn was not using md5crypt, but SHA1, which can be brute forced much faster. 
    
    You are also assuming a single computer.  Presumably the attackers would have a botnet with tens of thousands of computers or more.
  • Ben | Sat, 09 Jun 2012 12:37:52 UTC

    "The 6.5 million hashed passwords in all likelihood represent far more users than that, because everybody who chose "qwer5678" as a password shares a single entry in that file."
    
    So how are the password statistics concerning this leak even possible when no other information but those pw hashes are available?
  • Nate | Sun, 10 Jun 2012 04:37:30 UTC

    If you had an existing salted, SHA1 password database, couldn't you convert everyone instantly to a new combined algorithm that first computes the salted SHA1 and then passes it through bcrypt using the same or (even better) a second salt.  Then, assuming your system is un-compromised, you'd have a very secure password hash without requiring users to log in or change passwords.
    
    Any opinions on whether that would work? Seems like it would in my opinion.  I'd rather not wait for users to log in/change passwords to ensure the password db could not be compromised by brute force.  The one I'm dealing with is already salted, but uses SHA1 and I'd like to take it to the highest standard. 
  • shackrock | Sun, 10 Jun 2012 15:36:59 UTC

    I always have used this webpage as a guide: http://elbertf.com/2010/01/store-passwords-safely-with-php-and-mysql/ - which seems to cover all of your statements from the article, agreed?
  • Anon | Tue, 12 Jun 2012 00:05:48 UTC

    "Recent research has managed to get a GPU processor very close to 1,000,000 md5crypt operations per second ... and that means that any 8-character password can be bruteforced in approximately two days."
    
    Your calculation assumes an 8-character lowercase a-z password, not "any 8-character password". If you assume that your password can be made from the 52 upper and lower case letters, the 10 digits, the 32 keyboard symbols, and spaces, your time goes from 2 days to 200 years.
  • Poul-Henning Kamp | Tue, 12 Jun 2012 10:27:54 UTC

    @Anon:
    
    Please see the SHARCS conference presentation linked to in the article for stats.
  • Michael | Tue, 12 Jun 2012 15:32:57 UTC

    @PB | Fri, 08 Jun 2012 09:08:17 UTC 
    "Could it be changed even sooner than that by generating salted hashes of the stored hashes? Then when someone logs in you can simply try both?"
    
    Shouldn't the hash and not the password be transmitted over the internet? This also distributes the computation issue to all user computers instead of a single webserver.
  • Cory House | Wed, 13 Jun 2012 15:37:24 UTC

    @Michael - Login should use https to assure sending the plain text password is safe. The password must be hashed on the server to avoid exposing the hashing algorithm and salt in client-side code. 
  • Michael | Thu, 14 Jun 2012 09:45:17 UTC

    @Cory House:
    Eh, the salt and algorithm should not need to be secret. The password should. If the channel is safe enough to transmit a plain text password, it is also secure enough to transmit a hash of that password.
    
    If the channel is being sniffed, it is also better that a hash is discovered, which is only valid for a single website, due to the salt, while a password is likely valid for many websites.
  • bendra | Tue, 17 Jul 2012 22:04:51 UTC

    Hello,
    
    Interesting article and comments!  One thing this has me wondering is: if GPUs are so darn good at hashing, is there anyone who is using GPU-based hashing on the server to log users in?  It would take quite a load off the server, particularly if you are using a slow adaptive hash no?
Leave this field empty

Post a Comment:

(Required)
(Required)
(Required - 4,000 character limit - HTML syntax is not allowed and will be removed)