Storing Passwords Safely
If your application deals with user accounts, it has to deal with passwords. Storing passwords in plain text would be a bad idea; a data breach could allow an attacker access to every account. The obvious answer is to encrypt the passwords. However, using cryptography without understanding could give you a false sense of security—if you make the inappropriate choice, you could make things easier for an attacker without realizing it. This article will focus on getting you up to speed with the best ways to use cryptography to secure passwords.
Fundamentals
When people think encryption, they usually think of schemes that allow you to send a file encrypted with a key and decrypt it with the same or a different key. This is useful for most data, but not with passwords, because you don’t want anyone to decrypt them. Instead, use one-way algorithms; better known as hashing.
The recommended hashing algorithms nowadays are SHA-2 and SHA-3; while many people think of MD5 and SHA-1, these have weaknesses that make them not recommended for new applications. With hashes, your users only need to provide the original value, and you store only the hashed value. Then you hash the value provided by the user and compare it to the hashed value you stored earlier. If the hashed values match, it’s valid.
Simply hashing passwords isn’t enough. If you just have hashes, someone could calculate hashes for common passwords ahead of time and see which ones matched. Large lists of pre-computed hashes are called rainbow tables, and are often handed around. You should also add a salt (an arbitrary secret string) in addition to the password before hashing. By storing both the salt and the hashed password, it makes it harder for attackers to be able to guess the contents of the password, because they need to include the salt as well.
Algorithms designed for passwords
Hashing algorithms can also be tuned for passwords. Most hashing algorithms are designed for speed in addition to security, because they’re often used for checking the integrity of lots of large files. For password hashing, you don’t want speed. Validating a hash on such a small amount of data doesn’t need to be quick. If it is quick, then it makes it easier for attackers to try to generate more and more passwords. Thus, algorithms designed for passwords will try to do things like running multiple iterations behind the scenes. The recommended algorithms for password hashing are argon2 and bcrypt, which take password-related factors into account. When you have the choice, argon2 is better than bcrypt because it takes new advances in cryptography research into account.
It’s worth noting that you shouldn’t write your own crypto code unless you know exactly what you’re doing. It’s very easy to introduce subtleties that can compromise the security of the system. When possible, leave the implementation to the experts. Most languages, libraries, or OSes will include a facility to easily do a specific task like hashing passwords. If they don’t, they’ll give you lower-level ways to access modern cryptography algorithms; still, take caution when doing so.
How to hash passwords in PHP
PHP provides built-in functions to make password hashing easy, secure, and error-free. You create passwords with password_hash, and verify them with password_verify. These functions are hard to misuse and do the right thing by default, including upgrading to better algorithms when available. For example, let’s make a password:
1 2 |
In: var_dump(password_hash("hunter2", PASSWORD_DEFAULT)); Out: string(60) "$2y$10$Hw1zOtl.AW0rXgNJAoZ2JufKoifJJM.8AcA3Ox/xe2D/uE4WE0F/." |
When you call password_hash(), it returns a single value you can simply stuff inside a database. This value packs all the information needed to verify the password later; the algorithm, version, salt, and hash. PHP will even generate the salt string for you, so you don’t need to risk reusing salts. It will also pick an algorithm. Later, when you verify a password, all you need is the password the user gave you, and the result of password_hash that you stored somewhere, passed to password_verify():
1 2 3 4 |
In: var_dump(password_verify("hunter2", '$2y$10$Hw1zOtl.AW0rXgNJAoZ2JufKoifJJM.8AcA3Ox/xe2D/uE4WE0F/.')); Out: bool(true) In: var_dump(password_verify("hunter3", '$2y$10$Hw1zOtl.AW0rXgNJAoZ2JufKoifJJM.8AcA3Ox/xe2D/uE4WE0F/.')); Out: bool(false) |
With this approach, there’s no risk you’ll use raw hashing primitives and accidentally get it wrong. Even if your user table is somehow compromised by an attacker, they can’t easily take all the passwords with it.
Protect your passwords the right way
By using trusted hashing functions that reflect the latest security principles, you can stay a step ahead of attackers who would like to have your passwords. Don’t try to write your own encryption or hashing functions, though. Professional solutions are available—use them.
If you would like more information, or if your application might benefit from a security audit, please get in touch.
Calvin,
I just wanted to let you know I appreciate the time you put into this article and ask one question or make one request for clarification.
If, when PHP’s built-in functions password_hash() and password_verify() are used as described and the output of password_hash() is stored in a database — output that’s an encoded form of the salt and other components (i.e. algorithm, version, hash) — does this not also potentially disclose the salt if the database was compromised/leaked? And, if so, wouldn’t disclosure of the salt defeat its purpose as a mitigation for rainbow tables?
Regards,
Ben R
The salt is generated per-password, not shared. (It’s a common mistake with a lot of manual salting schemes.) You need that salt value to actually make sure the password you received matches the stored value from password_hash. It’s safe because the salt is different for each password, so while brute-forcing one should be the same as without password_hash, it’s less catastrophic on a leak because the salt being different means rainbow tables wouldn’t be valid; they’d have to be generated for each salt value on each password, making them have to apply brute force for all the hashes
Calvin,
Thanks for responding. Yes, it makes sense that since the salt for each password differs there’s an advantage as compared to a single-salt for all passwords. However, since the salt is stored in the database alongside the hash, encoded for use by password_verify(), isn’t the salt subject to the same disclosure risk/s as the hash such that both would be “equally” disclosed in database compromise/leak?
In other words, if the constituent parts of the hash (i.e. algorithm, version, salt, hash) are basically “encoded” and stored in the database, in some respects, aren’t the keys and kingdom in the same jar? Maybe incorrectly, I’ve always regarded the salt as a kind of secret (original article seems to also: “…an arbitrary secret string…”) that shouldn’t be disclosed and, in that case, wouldn’t it be better to keep the keys and and kingdom in separate places? Granted, the original password isn’t among the stored values and per-password salt values necessitate more expansive rainbow tables or, possibly, the generation of *many* new entries in those tables, and that’s not exactly quick. However, after a data disclosure/leak, assuming its detected, it seems the limiting factors preventing bad actors from discovering the plain text passwords are [only] time and compute resources. The latter becomes less and less of a limiting factor as time goes on due to things like GPUs, parallel processing and other advances (e.g. quantum computing).
Full disclosure: I’m not a cryptoanalyst, so I probably don’t have a good grasp on how practical (or not) discovery of the plain text passwords would be after data disclosure/leak that included the algorithm, version, hash and salt versus one that did not include the salt. So, I’m really asking some questions in the hope that both the questions and answers will be helpful to me and other readers.
Regards,
Ben R
Sorry for the late reply. I’m not a cryptoanalyst either (though I can consult some); while you could store the hashes separately, you still need to retrieve them per-row (or your separate salt becomes a single point of failure). Doing it that way seems to be just security by obscurity to me, and nudges you towards writing your own hashing code, which should be unnecessary.
Even if the salt is disclosed, the fact it’s unique per-row should make it harder by having to compute all the possible hashes per-row instead of just having to generate them once for all rows.
If anything’s still not clear, let me know; I can call up said cryptography people and try to get a clarification.