Generating secure cross site request forgery tokens (csrf) - ESchrade

by Kevin Schroeder | 1:03 pm

I don’t talk much about security. This is mostly because it’s such a moving target. I’m also horrified that I might give bad advice and someone will be hacked because of me.

But in researching the second edition for the IBM i Programmer’s Guide to PHP Jeff and I decided to include a chapter on security since we really didn’t talk much about it in the first edition. I’m talking about cross site request forgeries right now and I wanted to make sure that what I was going to suggest would not break the internet in some way.

I did some Google searching to see what other people were recommending. Almost all of the pages I found for generating a CSRF token use code like this

1	$token = md5(uniqid(rand(), true));

On the pages for rand() and uniqid(), as well as looking at the C code, they specifically state that these functions should not be used for generating secure tokens. They tend to generate predictable values. And the documentation for md5() states that it should not be used for password hashing. Granted we’re not hashing passwords when creating a CSRF token, but with the tooling available shouldn’t we be using functions that are more cryptographically secure? Like this?

$token = hash_hmac(
    'sha512',
    openssl_random_pseudo_bytes(32),
    openssl_random_pseudo_bytes(16)
);

Am I missing something or wouldn’t something like this be a whole lot better?

[UPDATE]

padraicb validated my thought on the matter. The goal here is the random value. As such the hashing using hash_hmac() does not buy you a whole lot extra. The number of possible values in a 32 byte random string is 1.1579208923731619542357098500869e+77. That alone would seem to be enough for a CSRF prevention token. mt_rand() returns an integer which gives you about 4 billion possible numbers. While that will probably protect you, the other value will offer you better protection. There’s no sense in gambling with a smaller value if you have the ability to generate a larger value with virtually no additional cost.

So it would seem that, for generating a proper token the code that you would really need is this

1	$token = base64_encode( openssl_random_pseudo_bytes(32));

The only reason for the base64_encode() call is to make sure that the value provided will not break your HTML layout.

Security

Tags: security

42 COMMENTS

ezimuel

The real security problem in generating a secure CSRF token is the randomness of the seed. MD5 or SHA512 are not so different in this case from a security point of view. The openssl_random_pseudo_bytes() is the most secure way to generate good random numbers in PHP. For instance, in ZF2 we used that function to generate CSRF token in ZendForm.

Feb 11.2013 | 12:12 pm

kschroeder

Thanks. That’s a good point. In other words, using md5() or sha512 is not as important as getting the actual random bits. The hashing, itself, is really only there to make sure that the bits that come out do not break the format. One could almost say that when using openssl_random_pseudo_bytes() you could use md5(), hash_hmac() or base64_encode() without a loss of security, something that would not be possible to say about uniqid().

Feb 11.2013 | 12:16 pm

kschroeder

…I should say a *significant* loss in security.

Feb 11.2013 | 12:16 pm

ezimuel

I would suggest to use an hashing algorithm (MD5 or SHA-*), instead of base64, as final output for a token because it offers a better obfuscation of the seed (hashing are not invertible).

Feb 11.2013 | 12:37 pm

padraicb

@kschroeder The primary goal of the CSRF token is to be an unpredictable random string of sufficient length to defeat brute force attacks. So literally the OpenSSL PRNG is sufficient. 32 being a nice length (anything less than 8 being severely weak). Hashing or obscuring the token is unnecessary since the random number is itself is not a secret – what is sent to the user is. If that’s a hash then the attacker only needs the hash. Base64 encoding is merely to ensure the token is a simple ASCII compatible string.
Note: Tokens are generated securely as a standard practice. Also note the “pseudo” in the function name if concerned about entropy consumption ;).

Feb 13.2013 | 05:34 am

timoh

I’d like to add (as I posted to pmjones’ blog) that it is a bit misleading to say that openssl_random_pseudo_bytes() is “better” (security-wise speaking) than any other method that relies on /dev/urandom (or the Windows equivalence on Windows). Reading straight from /dev/urandom, or fetching bytes some other way (which uses /dev/urandom) are all practically equal.
Care should be taken to make sure to avoid those quirks when fetching random bytes. For example, openssl_random_pseudo_bytes() blocking on certain versions, /dev/uradom not available on Windows and security issues with mcrypt_create_iv() (using DEV_URANDOM) on certain versions on Windows.

Feb 14.2013 | 04:52 am

ezimuel

@timoh you are right but we compared mt_rand() or rand() with openssl_random_pseudo_bytes() and this is better from a secure point of view because it uses a pseudo random source like /dev/urandom. Moreover openssl_random_pseudo_bytes() is supported also on Windows where /dev/urandom is not available.

Feb 14.2013 | 06:05 am

timoh

Aah yep that’s right. I lost the context when pasting my comment from pmjones’ blog http://paul-m-jones.com/archives/4458/comment-page-1#comment-438733 😉

Feb 14.2013 | 07:29 am

siliconforks

That code used to be in the PHP manual:
http://web.archive.org/web/20090218083935/http://php.net/manual/en/function.uniqid.php
It was eventually

Feb 11.2013 | 12:50 pm

siliconforks

That code used to be in the PHP manual:
http://web.archive.org/web/20090218083935/http://php.net/manual/en/function.uniqid.php
It was eventually removed.

Feb 11.2013 | 12:51 pm

kschroeder

It uses that as an example for generating a token, but that page also specifically states that it is based off of microtime. Because of that the value would be predictable.

Feb 11.2013 | 01:34 pm

siliconforks

Yes, it would be predictable – presumably that’s why that code was removed. I’m just saying that is why you see that code all over the Internet (and in various open source projects) – it is because everyone originally copied it from the PHP manual.

Feb 11.2013 | 01:44 pm

harikt

I am not a cryptographic expert.
In Aura.Session uniqid(mt_rand(), true); is used
https://github.com/auraphp/Aura.Session/blob/develop/src/Aura/Session/CsrfToken.php#L81
The problem with openssl is we need it to be installed and configured in server. I have seen another one hash(‘sha256’, uniqid(mt_rand(), true), true);
at https://github.com/FriendsOfSymfony/FOSUserBundle/blob/master/Util/TokenGenerator.php#L60

Feb 12.2013 | 12:22 am

ezimuel

The problem with uniqid(mt_rand(), true); is related with mt_rand() that is not cryptographically secure. A more secure way to generate a random token is to use md5(openssl_random_pseudo_bytes(32)); or hash($algo, openssl_random_pseudo_bytes(128)); where $algo is sha-*. If you don’t have the OpenSSL extension enabled you can use the mcrypt_create_iv($length, MCRYPT_DEV_URANDOM); where $length is the size of the random bytes. We implemented a random generator in ZF2 based on this considerations: https://github.com/zendframework/zf2/blob/master/library/Zend/Math/Rand.php#L25

Feb 12.2013 | 03:42 am

pmjones

I am not a security expert, so please be gentle.
What does the extra cryptographic security buy us? For long-lived hashes that get used over and over, I can see the point, but for what are short-lived tokens, it seems a bit of overkill.
Additionally, it seems like it would deplete the entropy available to the system more rapidly. Too many CSRF tokens that get used and thrown away means you don’t have the entropy when you need it for real security.

Feb 12.2013 | 08:54 pm

harikt

I too think the same for it is just a form token. Is the cryptography really needed. May be to that sort of systems, but not to all I guess.

Feb 12.2013 | 11:27 pm

kschroeder

Cryptos (κρυπτός) and graphein (γράφειν) just means “secret writing”. When we’re generating a token what we want to do is give a secret to the person on the web page that will be extremely difficult to predict. The examples that I’ve found tend to rely on uniqid() which is based off of the time and, thus, predictable. So when you’re thinking about cryptography you are probably thinking about the actual act of encryption, which is not what we’re talking about. We are using the tool from one of the first steps in the chain for creating an “unpredictable” value.
The 32 bytes (256 bits) of data give us 1.1579208923731619542357098500869e+77 values, which is a pretty big set of values for you to use and so I doubt that you would deplete entropy.
However, mt_rand() returns an integer, not a series of bytes. That means that you have only 4 billion or so numbers to choose from. Compared to that other huge number, I would choose the latter.

Feb 13.2013 | 07:53 am

harikt

Thank you for making it clear.

Feb 13.2013 | 08:10 am

pmjones

You are making the assumption, though, that a CSRF token falls in the realm of “cryptography.” (Perhaps it is.)
Is not a random shared value, sent along with the form, enough to defeat CSRF attacks? You say the random value is predictable and this may be true, but I’d like to see a demonstration of it. How much time and effort is required to predict it?

Feb 13.2013 | 08:21 am

kschroeder

There are parts of token generation that, on a basic level, do fall into the realm of cryptography since cryptography is about “writing secrets”. Beyond that the link to crypto is simply that the cryptographic tooling does a better job of providing more, better, pseudo-random values.
When we’re talking about predictability it will depend on which function we’re talking about. If you have a timestamp, uniqid() is actually pretty easy to guess. It was designed to be unique, not unpredictable. And mt_rand() isn’t so much predictable as it has a significantly smaller pool of values to choose from. In other words, mt_rand() is good, but openssl_random_pseudo_bytes() is better.

Feb 13.2013 | 08:32 am

timoh

When using cryptographically strong random bytes, you don’t have to worry about possible edge cases and attack vectors etc. that may appear when using weak randomness. Ie. when the system is under an active attack. I’d make sure CSRF tokens are also generated using strong randomness (it is easy to make sure the system do not get vulnerable, in any situation (edge cases included), because of weak randomness). If strong randomness is not available, just exit with an error.
About “deplete the entropy available”, this is actually not the case with /dev/urandom and alike. System random number generators (like /dev/urandom) do not run out of entropy. Urandom _might_ be low on entropy immediately after a fresh OS install, but this is insignificant when talking about web apps.

Feb 14.2013 | 05:09 am

pmjones

Additionally, the OWASP guys seem to think mt_rand() is sufficient for the purpose:
https://www.owasp.org/index.php/PHP_CSRF_Guard
I cannot say if their method is *actually* sufficient.

Feb 12.2013 | 08:58 pm

harikt

By the way @pmjones they are using a hashing algorithm ( $token=hash(“sha512”,mt_rand(0,mt_getrandmax())); ) and in the top it mentions the code is not verified by OWASP experts.

Feb 12.2013 | 11:33 pm

ezimuel

I just sent an email to the author of PHP_CSRF_Guard suggesting to use openssl_random_pseudo_bytes() instead of mt_rand(). I agree with @padraicb, the random number provided by OpenSSL is enough for a CSRF token, you can just use it without an hash function.

Feb 13.2013 | 06:00 am

padraicb

The OWASP version relies on two options as a token:

A. The SHA512 hash of mt_rand().

The MD5 hashes of all outputs from mt_rand() are online. SHA256 hashes can be brute forced at some incredible speeds on a GPU making it fairly pointless for minimal entropy inputs – it’s only a number between 0 and 2^31 (mt_getrandmax()). SHA512 is much much slower that SHA256 but I can’t help wonder if it’s so slow as to take TOO long running only 2.147B comparisons – most hashing tools have GPU support these days and the last GPU generation were marvellous for this task. It wouldn’t surprise me if it took

Feb 18.2013 | 06:43 am

pmjones

Based on comments elsewhere, I see the point. Looks like I have to modify https://github.com/auraphp/Aura.Session to use SSL when available, and only fall back to mt_rand when SSL is not available. Thanks, gentlemen.

Feb 13.2013 | 10:24 am

ezimuel

In ZF2 we used a chain of tests for random generation:
1) if OpenSSL is installed we used the openssl_random_pseudo_bytes();
2) if Mcrypt is installed we used mcrypt_create_iv($length, MCRYPT_DEV_URANDOM); that uses ‘/dev/urandom’ source.
3) mt_rand() as a fallback, but only for not cryptographic purpose.
More details here: https://github.com/zendframework/zf2/blob/master/library/Zend/Math/Rand.php#L25

Feb 13.2013 | 10:30 am

pmjones

Cool. I see that the Math library does that; is it used in the Session library?

Feb 13.2013 | 10:43 am

ezimuel

Yes, it’s used to generate the CSRF token in ZendForm, using ZendValidatorCsrf and ZendSession. Here the details: https://github.com/zendframework/zf2/blob/master/library/Zend/Validator/Csrf.php#L288

Feb 13.2013 | 10:53 am

For CSRF tokens, mt_rand() is ok-ish but openssl_random_pseudo_bytes() is a lot better | Paul M. Jones

[…] Looks like we need to update Aura.Session to use openssl when available and fall back to mt_rand() when it’s not. Via Generating secure cross site request forgery tokens (csrf). […]

Feb 13.2013 | 10:30 am

ircmaxell

The perfect reason for not relying upon rand() or mt_rand() is that both are susceptible to seed poisoning: http://www.suspekt.org/2008/08/17/mt_srand-and-not-so-random-numbers/
So to produce strong random numbers, rand() or mt_rand() should not be used in a predictable manner: http://blog.ircmaxell.com/2011/07/random-number-generation-in-php.html
I’m working on splitting out the RNG from CryptLib and PasswordLib into a stand-alone dependency so that you can use its strong random mixer to produce these kinds of tokens (it uses many sources to generate the randomness, and is secure as long as any one source is secure)… https://github.com/ircmaxell/RandomLib

Feb 14.2013 | 07:30 am

RenThraysk

I don’t like the reliance on random numbers.
I actually think your first suggestion of a HMAC is on the right path, but again not hashing random bytes.
The $data argument to hash_hmac should be made up from serialised data. This should include the full uri to where the form is to be posted, session id, and any hidden values in the form ().
This provides not only CSRF protection, but also another layer of validation to parts of the form.
The $key parameter for the CSRF could be a site wide secret, and do away with needing to use $_SESSION at all.

Feb 15.2013 | 10:33 am

kschroeder

Could you explain why hashing values that are relatively easy to figure out is better than a pseudo random number generator?

Feb 15.2013 | 10:47 am

ircmaxell

If it’s 100% deterministic for the server (has no random per-session data), then it’s 100% deterministic for the client. And that means it’s 100% deterministic for an attacker as well. Which basically means that the protection is useless at stopping CSRF style attacks…

Feb 16.2013 | 06:41 am

RenThraysk

The key for the HMAC is server side secret. The client or attacker never knows it.

Feb 16.2013 | 08:21 am

ircmaxell

Well, you do disclose the derivative of it (via the HMAC), so if they know what goes into the left side, they can attempt to brute force the right side. Not a huge issue, but something to think about.
But in the end, what does this gain you? Nonce is a proven technique that does not require storing cryptographic secrets (which is what your key really is), and has good forward security (breaching today implies nothing towards breaching tomorrow). Your method requires a cryptographic secret, and has poor forward security (a breach today means a breach tomorrow). The rest of the security industry recommends using a random nonce, typically per-request (but at least per session). So what major benefit does this add to that paradigm that it’s worth going against the rest of the industry?

Feb 16.2013 | 08:27 am

kschroeder

Additionally, 70% of all successful attacks come from inside an organization. Having a configurable value a) requires you to manage the key, and b) is something that an internal attacker may have knowledge of. Using a large pseudo-random number requires no configuration management and is not known by an internal individual. Defense in Depth, baby!

Feb 16.2013 | 09:32 am

RenThraysk

Even a large psuedo-random number gets written somewhere, that is what $_SESSION does. So an internal attacker can read it.

Feb 19.2013 | 08:54 am

RenThraysk

If an attacker that can access the HMAC secret key on your server, you have more worrying concerns. Like credentials to access databases directly.
I wouldn’t say it was going against the rest of industry. The wider security field has created Message Authentication Codes as means to provide assurances about messages. The message in this case is a HTTP POST request.
Benefits:
It’s stateless.
Having multiple forms on the same page, or the user have multiple pages with multiple forms open will work, and each would have different token.
It’s trivial to combine an expiration time within the token, [expires.hmac(expires + data)] so you can shorten the time that a token remains valid. Closing the window on replay attacks.

Feb 19.2013 | 09:36 am

Stateless CSRF Tokens | Joseph Scott

[…] few notes about this approach. First, use openssl_random_pseudo_bytes instead of mt_rand ( suggested by Kevin Schroeder ) when possible. Second, be sure to only use === when comparing the token value. You want to avoid […]

Jul 24.2013 | 08:47 am

ari

I have some questions. if we imagine like this. I have function createToken(), then deleteToken() and validationToken() .

createToken() = to make $_SESSION[‘token’] with a unique code..
deleteToken() = to remove $_SESSION[‘token’] with unset() and there are functions createToken. so once $_SESSION[‘token’] with unset() removed, then automatically will make code for NEW $_SESSION[‘token’]..
validationToken() = to match the token code from $_POST[‘token’] in ajax , with $_SESSION[‘token’]. And if successful , it deleteToken() will active. That means , a new $_SESSION[‘token’] will appear..

My question, how to make the code NEW $_SESSION[‘token’] appear in index.php without the need to refresh?

Feb 15.2016 | 04:45 am

Kevin Schroeder

Ajax, or a push event (such as with Pusher), would seem to be the only ways to do that. Whatever happens, the browser will be dependent on the server for receiving the data.

Feb 15.2016 | 07:49 am

Comments

ezimuel

kschroeder

kschroeder

ezimuel

padraicb

timoh

ezimuel

timoh

siliconforks

siliconforks

kschroeder

siliconforks

harikt

ezimuel

pmjones

harikt

kschroeder

harikt

pmjones

kschroeder

timoh

pmjones

harikt

ezimuel

padraicb

pmjones

ezimuel

pmjones

ezimuel

For CSRF tokens, mt_rand() is ok-ish but openssl_random_pseudo_bytes() is a lot better | Paul M. Jones

ircmaxell

RenThraysk

kschroeder

ircmaxell

RenThraysk

ircmaxell

kschroeder

RenThraysk

RenThraysk

Stateless CSRF Tokens | Joseph Scott

ari

Kevin Schroeder

Leave a Reply Cancel reply

Note