The problem of what HTTP codes to use inevitably comes when designing RESTful services. There is a bit of subtlety, however, with regards to the 401 and 403 status codes. Here and there you will read advices suggesting to use:
- 401 for when an access token isn’t provided, or is invalid.
- 403 for when an access token is valid, but requires more privileges.
And the problem is that such an implementation can leak sensitive information. Here is my reply to the above mentioned article. In cryptography this is known as oracle and can lead to pretty serious attacks.
If the attacker sees a 403 status code, after a huge array of failed attempts with 401, this means that whatever the attacker tried passed the authentication phase, but failed the authorisation. Bingo, they just figured out a valid set of credentials (or tokens or ...) and your system said that in "clear text". It is similar to the "Username or password invalid" best practices (and don't read things like this!).
How to avoid the problem? Stick with one status code (e.g. 403) for both cases (failed authentication and failed authorisation), avoid being too user (or client) friendly. You can log the incident and return a unique request ID to the client. If a genuine client reports the problem, that request ID should help finding more details in logs. And of course track/monitor all the authentication/authorisation failures for anomaly detection purposes.
Now the fun part, all of the above as a mathematical proof. Let's assume a system where the maximum number of credentials possible is $N$, has $M$ registered users and $K$ of those users have access to a specific resource that the attacker tries to brute-force, $N>M\geq K$. Let's consider the following propositions/events:
- $A$ - guess a credential by accessing the given resource.
- $B$ - The system is designed to return: HTTP 200 for valid credentials with access to the resource, HTTP 403 for valid credentials with no access to the resource, HTTP 401 for invalid credentials. For simplicity, we can say $B=\{200\}\bigcup \{403\}\bigcup \{401\}$.
- $C$ - The system is designed to return: HTTP 200 for valid credentials with access to the resource, HTTP 403 for valid credentials with no access to the resource or invalid credentials. Or $C=\{200\}\bigcup \{403\}$
In other words:
- Cardinality of $A$ givnen $B$ is: $K$ possible cases of HTTP 200, plus $M-K$ possible cases of HTTP 403. Grand total is $M$.
- Cardinality of $A$ givnen $C$ is: $K$ possible cases of HTTP 200, which is also the grand total.
Now let's compute probabilities: $$P(A \mid B)=\frac{M}{N}$$ and $$P(A \mid C)=\frac{K}{N}$$ Obviously (because $M \geq K$) $$P(A \mid B) \geq P(A \mid C)$$ which means, a system designed like $B$ gives greater chances to the attacker to guess credentials. Obviously, it doesn't matter when all the registered users have access to the resource (i.e. $K=M$). End of discussion!
P.S. The way I computed probabilities may look a bit superficial, here is another way using total probabilities or $$P(A\mid B) = \frac{P(A \cap B)}{P(B)}=\sum\limits_{C_n} \frac{P(A \cap B \cap C_n)}{P(B)}=\\ \sum\limits_{C_n} \frac{P(A \cap B \cap C_n)}{P(B\cap C_n)}\cdot \frac{P(B\cap C_n)}{P(B)}=\\ \sum\limits_{C_n} P(A \mid B \cap C_n)\cdot P(C_n\mid B)$$ Then:
- $$P(A\mid B)=P(A\mid B \cap \{200\})\cdot P(\{200\}\mid B)+\\ P(A\mid B \cap \{403\})\cdot P(\{403\}\mid B)+\\ P(A\mid B \cap \{401\})\cdot P(\{401\}\mid B)=\\ P(A\mid \{200\})\cdot P(\{200\}\mid B)+ P(A\mid \{403\})\cdot P(\{403\}\mid B)+\\ P(A\mid \{401\})\cdot P(\{401\}\mid B)=\\ 1\cdot P(\{200\}\mid B)+ 1\cdot P(\{403\}\mid B)+ 0\cdot P(\{401\}\mid B)=\\ \frac{K}{N}+\frac{M-K}{N}=\frac{M}{N} $$
- $$P(A\mid C)=P(A\mid C \cap \{200\})\cdot P(\{200\}\mid C)+\\ P(A\mid C \cap \{403\})\cdot P(\{403\}\mid C)=\\ P(A\mid \{200\})\cdot P(\{200\}\mid C)+ P(A\mid \{403\})\cdot P(\{403\}\mid C)=\\ 1\cdot P(\{200\}\mid C)+ 0\cdot P(\{403\}\mid C)=\frac{K}{N}$$