Let me try a basic high-level explanation.
The modern approach is to encrypt messages $m$ as ciphertexts $c$ under a secret key $s$. To achieve security, we slightly alter $c$ with something called a noise/error $e$, which becomes part of $c$. If we perform some (homomorphic) operations on $c$, $e$ increases. There now is a limit on the number of operations: $e$ must not grow too much, since then we cannot decrypt correctly anymore. (At this point, you may visualize the noise as noise in a radio signal, whose music becomes indistinguishable if there's too much of the background noise)
Furthermore, we can consider decryption as an operation that removes the noise $e$ and gives us back the original message $m$. Needless to say, decryption requires the secret key $s$.
Now, to achieve FHE, i.e. unlimited operations on $c$, we need a way to reduce the noise $e$ for future operations, but without decrypting $c$ at all. Currently, we only know one way to achieve this: We need to homomorphically evaluate the decryption procedure. Because we know that the decryption procedure removes the noise in general, we can apply it in such a way that it reduces the large noise accumulated in a ciphertext.
Now the details are hard to grab, but for applying the decryption procedure homomorphically, you need to create a ciphertext encrypting (something like) the message $s$, the secret key itself. Without this, you wouldn't be able to evaluate decryption homomorphically, since $s$ is part of the decryption formula. As you noticed, this demands circular security, i.e. it is secure to encrypt $s$ under $s$ itself.
Now, informally speaking, if your evaluation of the decryption as a sequence of homomorphic operations creates less noise than it actually removes, you end up with less noise than before. Thus you can continue performing operations on $c$ and repeat this bootstrapping (= enabling something larger in terms of more operations) process to achieve FHE.