Updated 01/01/2023
TL;DR
Making a typical assumption about the reliable engineering of future physical qubits, about 2.5 million physical qubits will be needed (compare to a estimated figure of 0.95 million physical qubits for NIST P256 using similar methods). More detailed costing methods might increase these by a factor of around 10.
Logical Qubits
The question is easiest to answer in terms of logical qubits which are notional, idealised, computational resources that have arbitrary long lifetimes without information loss, execute gate operations perfectly and can be measured with perfect accuracy. We do not know how to engineer such resources, but if we could the calculations of Häner et al. show that about as few as 4258 logical qubits might be required to break NIST P521 (see Table 1 of their paper). They offer three parameterisations, trading off "width" (number of logical qubits), $T$-gate count (which we will see is important for physical qubit requirements) and "depth" (commensurable with run time). We'll consider all three parameterisations in the following.
Physical Qubits
In terms of qubits that that we can foresee engineering that have a plausible lifetime, gate fidelity and measurement resilience, the calculation becomes more involved as there are overheads from error-correction. It's also worth noting that if the mathematical ideas for error correction improve, then so will the overheads. It's possible to get very detailed in the modelling of qubits, but a relatively easy to follow method was laid out by Aggrawal et al., per appendix A of their paper the number of physical qubits necessary to implement a certain number of Clifford gates can be estimated* as
$$3.125\times n_L\times d_C^2$$
where $n_L$ is the number of logical qubits and $d_C$ is the error-correction code distance needed. Craig Gidney points out that a more modern error-correction approach allows the (smaller) estimate
$$2(d+1)^2.$$
For surface code error correction, Aggarwal et al approximate $d_C$ according to
$$\min_{d\in\mathcal N} (80p)^{\frac{d+1}2}\ge n_C^{-1}$$
where $p$ is the probability of a an error in gate computation and $n_C$ is the number of (Clifford) gates required by the circuit.
In addition to the computational overhead for error correction, there is also an issue that error-correction is not compatible with all gates required for quantum computation (specifically non-Clifford gates). The need for non-Clifford gates can be obviated with "state factories" that create auxiliary qubits to replicate non-Clifford gate functionality. This calculation is requires a bit more optimisation, but is still tractable (see Table III of their paper.
Putting this all together for a value $p=5\times 10^{-4}$ (which was used by Aggrawal, but is an improvement of the $10^{-3}$ estimate used by Gidney et al in their seminal 2048-bit factoring costing) we get a "circuit cost" for Häner et al. low width parameters
- $n_L=4258$
- $n_C = 1.85\times 2^{37}$
- $d_C = 16$
- $n_P = 2461124$ physical qubits.
For the low $T$-gate parameters we have
- $n_L = 5273$
- $n_C = 1.45\times 2^{35}$
- $d_C = 15$
- $n_P = 2699776$ physical qubits.
For the low depth parameter set we have
- $n_L = 5789$
- $n_C = 1.10\times 2^{37}$
- $d_C = 15$
- $n_P = 2963968$ physical qubits.
In all three cases, there is an additional state factory cost of 27000 qubits.
Note that the work of Aggrawal et al was followed by a more detailed analysis by Webber et al which produced slightly larger physical qubit requirements ($13\times 10^6$ over one day for P256 as opposed to an Aggarwal figure of $9.5\times 10^5$ over a few days based on the Häner et al. parameters). The Webber analysis is harder to recreate.
(*) - The Aggarwal paper quotes $3.125n_Ld_C$ which is a misprint.