I claim that for a large class of block cipher round functions including the AES round function, the affine structure on the round key space and also on the block space is definable from block cipher round function. Therefore, the only possible automorphisms for AES and related block ciphers are the affine transformations.
The case when $K$ is a group acting on $X$
Suppose that $K$ is a group acting on the set $X$ where for each $k\in K\setminus\{e\}$, there is some $x\in X$ with $k\cdot x\neq x$. Suppose that $F_{k}(x)=F(k,x)=k\cdot P(x)$ for some permutation $P:X\rightarrow X$.
Observe that $F_{k}^{-1}(x)=P^{-1}(k^{-1}\cdot x)$. We observe that
$$F_{j}(F_{k}^{-1}(x))=j\cdot P(F_{k}^{-1}(x))=j\cdot P(P^{-1}(k^{-1}\cdot x))
=j\cdot k^{-1}\cdot x.$$
In particular, the mapping from $K^{4}\times X$ to $X$ defined by $(i,j,k,l,x)\mapsto i\cdot j^{-1}\cdot k\cdot l^{-1}\cdot x$ is definable. Now, observe that $ij^{-1}kl^{-1}=e$ if and only if $x=i\cdot j^{-1}\cdot k\cdot l^{-1}\cdot x$ for all $x\in X.$ Therefore, the set of all $(w,x,y,z)\in K^{4}$ where $wx^{-1}yz^{-1}=e$ is definable in terms of the function $F$. Therefore, the function $K^{3}\rightarrow K,(x,y,z)\mapsto xy^{-1}z$ is definable in terms of the function $F$. The operation $K^{3}\rightarrow K,(x,y,z)\mapsto xy^{-1}z$ is called a heap operation, and the heap operation can be thought of as an operation from which you can recover most of the information from the group, but you determine which element of the group was the identity from the heap operation; on the other hand, you can easily recover the group operation from the heap operation along with the identity of the group. The heap operation should be thought of as a non-abelian generalization of the notion of an affine space.
Proposition: $(\phi,\psi)$ is an automorphism of the round function $F$. Then let $a=\phi(e)$, and let $L:K\rightarrow K$ be the mapping defined by $L(k)=a^{-1}\phi(k)$. Then $L$ is a group automorphism.
Proof: Observe that $\phi(k)=aL(k)$ for each $k\in K$. Therefore,
$$L(h^{-1}k)=a^{-1}\phi(h^{-1}k)=a^{-1}\phi(eh^{-1}k)=a^{-1}\phi(e)\phi(h)^{-1}\phi(k)$$
$$=a^{-1}a\phi(h)^{-1}\phi(k)=\phi(h)^{-1}\phi(k)=L(h)^{-1}a^{-1}aL(k)=L(h)^{-1}L(k).$$
Therefore, the mapping $L$ is a group homomorphism. Since $\phi$ is bijective, $L$ is bijective as well. Q.E.D.
Now define $M(k)=\phi(k)a^{-1}$. Then $M$ is an automorphism where
$\phi(k)=M(k)a$ for $k\in K$. Therefore,
$$\psi(j\cdot k^{-1}\cdot x)=\phi(j)\phi(k)^{-1}\psi(x)=M(j)aa^{-1}M(k)\psi(x)=M(jk)\psi(x).$$ Therefore,
$\psi(k\cdot x)=M(k)\psi(x)$ for each $k\in K$.
The case when $K$ and $X$ are isomorphic groups and the group action is left multiplication
Suppose now that $K,X$ are groups and $\iota:K\rightarrow X$ is an isomorphism. Then suppose that the action of $K$ on $X$ is defined by $k\cdot x=\iota(k)x$ for each $k\in K,x\in X$. Then $F_{k}(x)=F(k,x)=k\cdot P(x)=\iota(k)P(x)$ for each $k\in K,x\in X$.
We have $\psi(\iota(k)x)=\iota(M(k))\psi(x)$ for all $k,x$. In particular, if $b=\psi(e)$, then $$\psi(\iota(k))=\psi(\iota(k)e)=\iota(M(k))\psi(e)=\iota(M(k))b.$$ Said differently, if $x=\iota(k)$, then $\psi(x)=\iota(M(k))b=\iota(M(\iota^{-1}(x)))b$. For the remainder of this post, we shall write $M'=\iota\circ M\iota^{-1}$ and let $L'=\iota\circ L\circ\iota^{-1}$.
In fact, you can easily define the operation $X^{3}\rightarrow X,(x,y,z)\mapsto xy^{-1}z$ from the operation $K^{2}\times X\rightarrow X(j,k,x)\mapsto\iota(jk^{-1})x$.
Conjugation by P
In the case where $K$ is simply acting on $X$ but where $X$ is not necessarily a group, we have $$F_{j}^{-1}(F_{k}(x))=P^{-1}(j^{-1}\cdot F_{k}(x))=P^{-1}(j^{-1}\cdot k\cdot P(x))=P^{-1}(j^{-1}k\cdot P(x)).$$
Now, assume that we are in the case when $k\cdot x=\iota(k)x$ and $\iota:K\rightarrow X$ is a group isomorphism. Then
Observe that $F_{j}^{-1}F_{k}(x)=P^{-1}[\iota(j^{-1}k)\cdot P(x)].$
Suppose that $F_{j}^{-1}F_{k}(u)=v$. Then $F_{j}^{-1}F_{k}(x)=P^{-1}[P(v)P(u)^{-1}P(x)]$, so conjugate heap operation
$(x,y,z)\mapsto P^{-1}[P(x)P(y)^{-1}P(z)]$ is also definable from $F$.
Therefore, the mappings $\psi,P\psi P^{-1}$ must both be heap automorphisms.
SP networks
Suppose that $K=U^{n},X=V^{n}$ where $U,V$ are groups, and $I:U\rightarrow V$ is an isomorphism. Suppose that $\iota:K\rightarrow X$ is the isomorphism defined by letting $\iota((r_{k})_{k})=(I(r_{k}))_{k}$.
Let $s:V\rightarrow V$ be a bijection. Define a mapping $S:V\rightarrow V$ by letting $S((v_{k})_{k})=(s(v_{k}))_{k}$. Let $\Gamma:X\rightarrow X$ be a heap automorphism.
Suppose that $P=\Gamma\circ S$. As before, we assume that $F_{k}(x)=\iota(k)P(x)$.
Observe that $P^{-1}[P(x)P(y)^{-1}P(z)]=S^{-1}[S(x)S(y)^{-1}S(z)]$, so the mappings $\psi,S\psi S^{-1}$ must both be heap automorphisms.
Let $\mathcal{V}=(V,\Omega,\mho)$ be the algebraic structure where $\Omega,\mho$ are the ternary operations defined by letting
$\Omega(x,y,z)=xy^{-1}z,\mho(x,y,z)=s^{-1}[s(x)s(y)^{-1}s(z)]$. Then $\psi$ must be an automorphism of $\mathcal{V}^{n}$. We shall now limit the possibilities for automorphisms $\psi$ when the function $s$ satisfies nice properties.
We say that $s$ is skew-rigid if whenever
$E:\mathcal{V}\times\mathcal{V}\rightarrow\mathcal{V}$ is an endomorphism,
the mapping $x\mapsto E(e,x)$ is a constant mapping or the mapping
$x\mapsto E(x,e)$ is a constant mapping.
Theorem: If the mapping $s$ is skew-rigid, then there is a bijection $W:\{0,\dots,n-1\}\rightarrow\{0,\dots,n-1\}$ along with automorphisms
$\psi_{0},\dots,\psi_{n-1}$ of $\mathcal{V}$ such that
$\psi((x_{k})_{k=0}^{n-1})=(\psi_{j}(x_{W(j)}))_{j=0}^{n-1}$ whenever $x_{0},\dots,x_{n-1}\in V$.
Proof:
Define a mapping $\text{in}_{j}:V\rightarrow V^{n}$ by letting
$\text{in}_{j}(r)=(r_{k})_{k=0}^{n-1}$ by letting $r=r_{j}$ and $r_{k}=e$ whenever $k\neq j$. Define a mapping $\text{jn}_{j}:V^{n}\rightarrow V$ by letting $\text{jn}_{j}(r_{k})_{k=0}^{n-1}=r_{j}.$
Now, observe that $\text{jn}_{j}\circ\psi\circ\text{in}_{i}:\mathcal{V}\rightarrow\mathcal{V}$ is an endomorphism for all $i,j$, so let
$\psi_{i,j}=\text{jn}_{j}\circ\psi\circ\text{in}_{i}$. Since $\mathcal{V}^{n}$ is generated by the subalgebras of the form
$\text{in}_{i}[\mathcal{V}]$, we observe that the automorphism $(\theta,\psi)$ is completely determined by the matrix of endomorphisms $(\psi_{i,j})_{i,j}$ of $\mathcal{V}$.
Suppose that $s$ is skew-rigid. Suppose $i\neq j$. Let
$\text{in}_{i,j}:\mathcal{V}^{2}\rightarrow\mathcal{V}^{n}$ be the mapping defined by letting $\text{in}_{i,j}(u,v)=(v_{k})_{k}$ precisely when $v_{i}=u,v_{j}=v$, and $v_{k}=e$ whenever $k\not\in\{i,j\}$.
Suppose $i\neq j$ and $E=\text{jn}_{k}\circ\psi\circ\text{in}_{i,j}$.
Then $E$ is a homomorphism from $\mathcal{V}^{2}$ to $\mathcal{V}.$
Furthermore, $E(x,e)=\psi_{i,k}(x),E(e,x)=\psi_{j,k}(x)$, so the mapping $\psi_{i,k}$ is a constant mapping or the mapping $\psi_{j,k}$ is a constant mapping.
For each $n$, there is an $n+1$-ary term $t$ in the language of $\mathcal{V}$ such that $t(x_{1},\dots,x_{n},e)=x_{1}\dots x_{n}$ and more generally that $$t(x_{1},\dots,x_{n},a)=x_{1}a^{-1}x_{2}\dots x_{n-1}a^{-1}x_{n}.$$
We have
$$(x_{k})_{k=0}^{n-1}=t(\text{in}_{0}(x_{0}),\dots,\text{in}_{n-1}(x_{k}),e),$$ so
$$\text{jn}_{j}\psi((x_{k})_{k=0}^{n-1})=t(\text{jn}_{j}\psi(\text{in}_{0}(x_{0})),\dots,\text{jn}_{j}\psi(\text{in}_{n-1}(x_{k})),\text{jn}_{j}\psi(e)).$$ Therefore, there is some $i$ where
the mapping $\text{jn}_{j}\psi\text{in}_{i'}$ is a constant function whenever $i'\neq i.$ Therefore, there is a function $W:\{0,\dots,n-1\}\rightarrow\{0,\dots,n-1\}$ where for all $j$, there is some endomorphism
$\psi_{j}:\mathcal{V}\rightarrow\mathcal{V}$ with
$\text{jn}_{j}\psi((x_{k})_{k=0}^{n-1})=\psi_{j}(x_{W(j)})$.
Therefore, we have $\psi((x_{k})_{k=0}^{n-1})=(\psi_{j}(x_{W(j)}))_{j=0}^{n-1}$. Since the mapping $\psi$ is an automorphism and since $\mathcal{V}$ is finite, we know that each mapping $\psi_{j}$ must also be an automorphism and the mapping $W$ must be a bijection.
Q.E.D.
The above theorem states that whenever $s$ is skew rigid, the automorphism group of $F$ is therefore a subgroup of the wreath product of $\text{Aut}(\mathcal{V})$ with the symmetric group $S_{n}$.
We can also guarantee that the automorphism group of $F$ is reasonably tame under different assumptions about the S-boxes.
Let $H$ be the group of permutations of $X$ generated by all permutations of the form $F_{j}\circ F_{k}^{-1},F_{j}^{-1}\circ F_{k}$. We shall now show that under reasonable hypotheses, the decomposition of $H$ has an internal direct product of directly indecomposable groups is unique.
From a weakened version of the Krull-Schmidt theorem, we know that if $G_{1},\dots,G_{\alpha},H_{1},\dots,H_{\beta}$ are finite directly indecomposable groups, and
$G_{1}\times\dots\times G_{\alpha}\simeq H_{1}\times\dots\times H_{\beta}$, then $\alpha=\beta$, and there is a bijection $\zeta:\{1,\dots,\alpha\}\rightarrow\{1,\dots,\beta\}$ where $G_{i}\simeq H_{\zeta(i)}$ for $1\leq i\leq\alpha$.
If $j\in U$, then let $s_{j}$ be the permutation of $V$ defined by letting
$s_{j}(v)=I(j)s(v)$ whenever $v\in V$. Let $H^{-}$ be the group of permutations of $V$ generated by the mappings $s_{j}\circ s_{k}^{-1},s_{j}^{-1}\circ s_{k}$.
Lemma: Suppose that for $1\leq i\leq n$, $\Delta_{i}$ is a finite non-abelian group with $|\Delta_{i}|>1$ and where if $A,B\subseteq\Delta_{i}$ are subgroups such that $ab=ba$ whenever $a\in A,b\in B$ and where $\Delta_{i}=AB$, then either $|A|=1$ or $|B|=1.$ Then whenever $\phi:\Delta_{1}\times\dots\times\Delta_{n}\rightarrow\Delta_{1}\times\dots\times\Delta_{n}$ is an automorphism, there is a permutation $\rho$ of $\{1,\dots,n\}$ and isomorphisms $\phi_{i}:\Delta_{\rho(i)}\rightarrow\Delta_{i}$ where
$\phi(x_{1},\dots,x_{n})=(\phi_{1}(x_{\rho(1)}),\dots,\phi_{n}(x_{\rho(n)}))$.
Proof: Observe that if $A_{1},\dots,A_{r}$ are subgroups of $\Delta_{i}$ and $ab=ba$ whenever $a\in A_{i},b\in A_{j},i\neq j$ and $\Delta_{i}=A_{1}\dots A_{r}$, then there exists an $i$ where $\Delta_{i}=A_{i}$ and where
$|A_{j}|=1$ whenever $j\neq i$. This observation can be established by induction on $r$.
For $i,j\in\{1,\dots,n\}$, let $\pi_{j}:\Delta_{1}\times\dots\times\Delta_{n}\rightarrow\Delta_{j}$ be the projection mapping, and let $\iota_{i}:\Delta_{i}\rightarrow\Delta_{1}\times\dots\times\Delta_{n}$ be the inclusion mapping. Then whenever $i_{1}\neq i_{2}$ and
$a\in\pi_{j}[\phi[\iota_{i_{1}}[\Delta_{i_{1}}]]],b\in\pi_{j}[\phi[\iota_{i_{2}}[\Delta_{i_{2}}]]]$, we have $ab=ba$. Since
$\Delta_{j}$ is generated by the subgroups $\pi_{j}[\phi[\iota_{i}[\Delta_{i}]]]$, we know that for all $j$, there exists at most one $i$ where $|\pi_{j}[\phi[\iota_{i}[\Delta_{i}]]]|>1$ and for such $i$, we have
$\Delta_{j}=\pi_{j}[\phi[\iota_{i}[\Delta_{i}]]]$. In other words, there is a function $\rho:\{1,\dots,n\}\rightarrow\{1,\dots,n\}$ where
$\pi_{j}[\phi[\iota_{\rho(j)}[\Delta_{\rho(j)}]]]=\Delta_{j}$ for all $j$ but where $|\pi_{j}[\phi[\iota_{i}[\Delta_{i}]]]|=1$ whenever $\rho(j)\neq i$.
Therefore, we have
$\phi(x_{1},\dots,x_{n})=(\pi_{1}\phi\iota_{\rho(1)}(x_{\rho(1)}),\dots,\pi_{n}\phi\iota_{\rho(n)}(x_{\rho(n)})).$
Since $\phi$ is an isomorphism, the mapping $\rho$ is a bijection, and
$\pi_{j}\phi\iota_{\rho(j)}$ is an isomorphism from $\Delta_{\rho(j)}$ to $\Delta_{j}$ for all $j$.
Q.E.D.
Lemma: Suppose that $G$ is a group that is factorizable as an internal direct product of subgroups $\Delta_{1},\dots,\Delta_{m}$. Furthermore, suppose that for all $i$, if $A,B$ are subgroups of $\Delta_{i}$ with
$ab=ba$ for each $a\in A,b\in B$ and $\Delta_{i}=AB$. Then if $\Delta_{1}^{\sharp},\dots,\Delta_{n}^{\sharp}$ is another factorization of $G$ as an internal direct product of subgroups, then $m=n$, and there is some permutation $\rho$ of $\{1,\dots,m\}$ where $\Delta_{i}=\Delta_{\rho(i)}^{\sharp}$ for all $i$.
Proof: By the Krull-Schmidt theorem, we know that $m=n$, and there is a permutation $\rho$ of $\{1,\dots,m\}$ where $\Delta_{i}\simeq \Delta_{\rho(i)}^{\sharp}$ for all $i$. Therefore, there is some automorphism $\phi$ of $G$ where $\phi$ restricts to an isomorphism mapping $\Delta_{i}$ onto $\Delta_{\rho(i)}^{\sharp}$ for all $i$. However, by the above lemma, we know that there is some permutation $f$ of $\{1,\dots,m\}$ where $\phi[\Delta_{i}]=\Delta_{f(i)}$ for all $i$. Therefore,
$\Delta_{\rho(i)}^{\sharp}=\Delta_{f(i)}$ for all $i$. Q.E.D.
Theorem: Suppose that whenever $A,B$ are subgroups of $H^{-}$ with
$ab=ba$ for $a\in A,b\in B$, either $|A|=1$ or $|B|=1$. Then if $(\phi,\psi)$ is an automorphism of $F$, then there is a bijection $W:\{0,\dots,n-1\}\rightarrow\{0,\dots,n-1\}$ along with automorphisms
$\psi_{0},\dots,\psi_{n-1}$ of $\mathcal{V}$ such that
$\psi(x_{0},\dots,x_{n-1})=(\psi_{0}(x_{W(0)}),\dots,\psi_{n-1}(x_{W(n-1)}))$ whenever $x_{0},\dots,x_{n-1}\in V.$
Proof: Suppose that $(\phi,\psi)$ is an automorphism of $V$. Then $\psi$ preserves the group $H$ in the sense that $H=\psi H\psi^{-1}$. Therefore, if $H$ is factored as an internal direct product of subgroups $H_{0},\dots,H_{n-1}$, then the group $H$ is definable in $(F,K,X)$, and the set of subgroups $\{H_{0},\dots,H_{n-1}\}$ is also definable in $(F,K,X)$.
Therefore, there is a bijection $\rho:\{0,\dots,n-1\}\rightarrow\{0,\dots,n-1\}$ such that $H_{\rho(i)}=\psi H_{i}\psi^{-1}$ for all $i$. Therefore, we have $H_{\rho(i)}\psi=\psi H_{i}$.
Suppose that $\pi_{0},\dots,\pi_{n-1}:X\rightarrow V$ are the projection functions. Suppose that $\hat{H}_{i}$ is the group generated by all $H_{j}$'s such that $i\neq j$.
Then $\pi_{i}(x)=\pi_{i}(y)$ if and only if $h(x)=y$ for some $h\in\hat{H}_{i}$ if and only if $\psi^{-1}h'\psi(x)=y$ for some $h'\in\hat{H}_{\rho(i)}$ if and only if $h'\psi(x)=\psi(y)$ for some
$h'\in\psi\hat{H}_{i}\psi^{-1}=\hat{H}_{\rho(i)}$ if and only if
$\pi_{\rho(i)}\psi(x)=\pi_{\rho(i)}(\psi(y))$.
Therefore, $\psi(x_{0},\dots,x_{n-1})=(\psi_{0}(x_{\rho^{-1}(0)}),\dots,\psi_{n-1}(x_{\rho^{-1}(n-1)})$ for some bijections
$\psi_{0},\dots,\psi_{n-1}$. Furthermore, the bijections $\psi_{0},\dots,\psi_{n-1}$ must be automorphisms of $\mathcal{V}$.
Q.E.D.
The above theorem applies when $H^{-}$ is either the alternating or symmetric group on a set of more than $4$ elements. In particular, the above theorem applies to the AES block cipher. A proof that for AES, the group $H^{-}$ is the alternating group can be found in Ralph Wernsdorf's paper that proves that the round permutations of the AES block cipher generate the alternating group of all permutations of $\{1,\dots,2^{128}\}.$