I doubt that anyone reading this is unaware of the recent attack on phpbb.com. One of the more interesting side affects of the exploit is finding out just how poor people are at picking passwords.
Good Password Practice
I realize that picking yet another password (yawn) is not the most exciting thing people do when joining a new board. Yet it’s the recent exposure of the phpbb users table that reveals just how important this is. One habit that I have used for years (and I strongly encourage everyone else to do as well) is you must pick a unique password for each and every site you use. This means having a different password for your amazon account, your ebay account, your online bank account, your stock brokerage account, your email account, your… okay, I imagine you get the point by now. Each site has to use a unique password, because you never know how (or when) an exploit could occur.
There are several typical responses that I will assume here, and I will try to address each of them.
“I can’t remember different passwords, so I use the same one on every site.”
That’s a bad idea. Now when any one of those sites gets exposed, a potential attacker has a password that they can use at any other site that you have registered for.
“This is not an important site, why should I use a secure password?”
When you join phpbb.com (or any other phpbb-based discussion board) it’s very unlikely that you will be involved in discussions about national defense. Yet that’s no excuse for reusing a password from another site (see earlier point) or using something stronger than a dictionary word.
How md5() Works
I want to digress a bit and talk about md5 and the difference between a hashing algorithm and an encryption algorithm. When you need to know the content, you have to use encryption because at some point you need to reverse the algorithm. When you don’t need to know the content, then a hashing algorithm can be more secure. During the password confirmation step I don’t “unhash” the originally stored value because it’s not possible. Instead I take the new input (the password entered on the form) and hash it. If the new hash value matches the hash stored in the database, then the password is correct. Note that there is a mathematical chance – however slim – that two different passwords could result in the same hash.
Simply put: the most important distinction between a hash and encryption is that encryption is a two-way function and hashing is a one-way function. There is no way to derive the source data from a hash. It’s as if I gave you the number “42″ and asked you for the source expression. It could have been 40 + 2, or 21 * 2, or any of a number of posibilities. There is no way to know as the source information is “lost” during the process of hashing.
This doesn’t mean that hashing is a bullet-proof process. It’s quite simple to take a dictionary full of words and hash each one. Once this is done, if I get a list of password hash values then I can match hash values found in the database against the hash values from my dictionary table. If I find a match, then I know that person’s password. It wasn’t done by reversing the process, it was done by applying the process to a large number of values and finding a match.
There is a link at the bottom of this post that details some of the passwords that were revealed. They weren’t very good. According to the article, 16% of the revealed passwords were the user’s first name. 4% of the passwords were some variation of the word “password” like passw0rd or password1 or even drowssap (password spelled backwards). And the number one password individual password of those exposed during the attack was 123456.
What Can I Do?
The best approach to password security is make sure the password you use is unique for each site that requires registration. Even something as simple as mixing upper and lower case letters makes the password harder to reveal. For example the md5() of “fluffy” (the name of your first pet) is ce7bcda695c30aa2f9e5f390c820d985 while the md5() of FlUfFy is 2b94ed30c45aada57410ed8a4db29159 instead. They are the same letters but a completely different hash. Adding just one symbol (like $, *, %, or anything that requries the shift key on a typical US keyboard) makes it even harder.
A dictionary attack is useful against dictionary words. It might even include standard “words” that are not in the dictionary but expected to be found as passwords (like the passw0rd example used earlier). If that were the only attack against md5() then setting up a simple rule that requires at least one number or special symbol would seem to make the password data safe. Unfortunately that isn’t the case. Since the md5() algorithm has been published for quite a while, there has been quite a bit of attention paid to it. One of the results is called a “rainbow table” and the full description can be found in the Wiki link at the end of this post. A simple summary is this: instead of storing hash inputs and output values like a dictionary, they store ranges (chains) of tables.
Rainbow tables use a refined algorithm with a different reduction function for each “link” in a chain, so that when there is a hash collision in two or more chains the chains will not merge as long as the collision doesn’t occur at the same position in each chain. As well as increasing the probability of a correct crack for a given table size, this use of multiple reduction functions approximately doubles the speed of lookups. See the paper cited below for details.
Rainbow tables are specific to the hash function they were created for e.g., MD5 tables can crack only MD5 hashes.
What does this mean? It means that even a more complex password like “sgfnyd” which is clearly not a dictionary word can be obtained.
Add a Pinch of Salt
When cooking you almost always add salt to a dish as it can enhance the flavor of your dish. Salting is also a process that alters the results of the hashing process, making passwords that much more difficult to determine. From what I have written so far, you should gather than the entire process of “cracking” md5 data is based on finding an input that matches the output since it is not possible to reverse the process. Dictionary lists and rainbow tables are both ways to do this.
If I can force your password to be outside of the normal result set from the md5() function, it would be come that much more difficult to match. That process is called “salting the password” and it can be done by the application rather than the user, thus making the use manditory rather than optional. It could be as simple as this:
md5(user_password . user_regdate)
Both values are known, and both are (in theory) unique to the user. The registration date for phpbb boards is down to the second. Even if two users register at exactly the same second, the odds of them using the exact same password are probably extremely small. As a result their password hash will never be the same. What is the advantage of this?
The salt “moves” the hash to a completely new value. Suppose that a hacker has a dictionary of hashed values. He knows that the hash of the word “fluffy” is ce7bcda695c30aa2f9e5f390c820d985 as I posted earlier. However, the user registered on 1234542068 (it’s a unix timestamp) so the input to the hashing function isn’t just “fluffy” but instead is “fluffy1234542068″. The result of that hash is b55e24f4bf18b9104f5af87b79e21468, which has no resemblance whatsoever to the expected hash value for the simple word the user picked for their password.
Even if the user database is exposed, the hacker would have to recalculate the dictionary results for every potential word in order to look for a match. If the same salt is used for every person, they have to generate one new table. If a different salt is used for each person, they would have to generate a dictionary lookup for every unique member of the board.
And if our user followed good practices and used something more complex than “fluffy” for their password, the amount of time it would require to find an input match to the hash – even if the salt value is known – is simply too much effort.
Is md5() the real problem? I don’t think it is. Weak passwords are the real problem. Because md5() has been around for a long time there are published ways to attack the data. Once the data becomes exposed then weak passwords become revealed. Salting the password can be done at the application level, and essentially enforces the concept of a stronger password. I am planning to review the password hashing process in phpBB3 and see how to port it back to phpBB2, but first I plan to write a salting MOD as I think it will be much easier to do, and it seems to be a fairly large improvement for what I believe will be a minimal effort.
- Password Analysis from phpbb.com user database
- Wiki on Rainbow Tables
- Salt (Cryptography)