The intuition is certainly that the last rounds can be removed one by one. If a differential distinguisher allows us to recognize the $n$-th round key, then surely the differential nonuniformity after $n-1$ rounds that allowed us to do this will be even worse at round $n-2$, so clearly the same trick can be used to recover the round key for round $n-1$ and so on... right?
Like any good intuition, this picture of things is incredibly useful and deeply flawed at the same time.
The most practically relevant objection to this is probably that in many attacks (differential or otherwise) that attack ciphers round by round, we only obtain partial information about the target subkey at each step. In other words, when we have used up all of the information that our differential distinguisher gives us about the last subkey, there might still be a rather large number of remaining reasonable candidates. Now if we have a distinguisher, it is true that the number of these candidates will be lower than if we had to try all of the last subkeys, but going down the rounds back to the source, it is still quite possible that the resulting search tree ends up bigger than the number of possibilities that we would have had to consider had we just done exhaustive search.
The usual two ways for attacks to get around this problem are to either show that the distinguisher is indeed good enough to prevent search tree explosion, or to show that at some point, we do not have to treat subsequent round keys as independent from our current final round key hypothesis any more because we can run the key schedule in reverse. In the former case, we can indeed solve the cipher round-by-round, whereas in the latter case, we end up with a set of candidate master keys that is possibly large but, for a successful attack, still smaller than the set of all master keys. These keys can then be tested against some other known information (most commonly known plaintext-ciphertext pairs) one by one.
The assumption that we can run the key schedule in reverse once we know a sufficient number of subkeys is satisfied for most practical ciphers, but is certainly not a law of nature. One could easily imagine a cipher where the subkeys would be derived from the master key by a one-way function, in which case attacks would indeed have to either resort to brute force search or solve for the subkeys as if they were independent.
There are various other cases where the intuition fails that a technique that allows us to solve $n$ rounds will also allow us to solve $n-1$ rounds; for instance, in differential attacks the distinguisher used to break $n$ rounds could depend on a particular difference in round $n-1$ that transitions to round $n-2$ deterministically, in which case we gain no new information when we descend to round $n-2$, or dependencies in the key schedule could make the expected differential nonuniformity of a cipher go up instead of down as rounds are added (although admittedly, this last case happens only in pathological cases).