I was taking a look at the PRESENT lightweight cipher presented here. You can see an implementation of it in Python here. It is basically a Substitution-Permutation (SP) cipher with ultra-lightweight encryption and lightweight decryption.
The algorithm for encryption is the following :
generateRoundKeys()
for i = 1 to 31 do
addRoundKey(state, k_i)
sBoxLayer(state)
pLayer(state)
end for
addRoundKey(state, k_32)
The algorithm for decryption is the following :
generateRoundKeys()
for i=32 to 2 do
addRoundKey(state, k_i)
pLayer_inv(state)
sBoxLayer_inv(state)
end for
addRoundkey(state, k_1)
I have the following question, regarding the decryption process. Of course, given the master key, all the keystream (e.g. the 32 derived keys) can be generated, in either forward or inverse direction. In the case of encryption, the decryption can start without knowing all the keys of the key schedule, which to my knowledge isn't the case for the decryption process. However, in the case of a hardware implementation (which is what this cipher is designed for) I don't think it is efficient to generate all the keystream at first and save it to a ROM and then start the decryption. My questions are, am I missing something algorithmically wise, for how to start the decryption without obtaining all the inverse key schedule first? Is it theoretically safe if the key schedule is circular? For example if $k_1$=$k_{32}$, $k_2$=$k_{31}$, etc. would it require more rounds?