The distinction between depth and degree depends on the topology of the circuit.
For arithmetic circuits, given any polynomial $p(x)$, there are many equivalent circuits that compute it.
Each circuit corresponds to an explicit way of associating the operations in $p(x)$.
For a basic example, consider $p(x_1,x_2,x_3) = x_1x_2+x_3x_2 + x_3.$
One can write this various ways, for example
$$p_1(x_1,x_2,x_3) = (x_2(x_1+x_3))+x_3,$$
or
$$p_2(x_1,x_2,x_3) = (x_3(1+x_2) + (x_1x_2)).$$
These define two alternative circuits (writing them down explicitly may be useful), but doesn't help with your overall question --- both have the same depth (namely 3).
A standard example of where depth and degree diverge is $p(x) = x^{2^k}$.
One can associate this as
$$p_1(x) = \overbrace{x(x(\dots x)\dots )}^{2^k\text{ times}}.$$
this corresponds to a depth $2^k$ circuit (the circuit is a "straight line") to compute the degree $2^k$ polynomial.
One can alternatively write this as a "full binary tree".
First you compute $x^2 = x\cdot x$, then $x^4 = x^2\cdot x^2$ (computing a degree 4 polynomial in depth 2), then $x^{8} = x^4\cdot x^4$ (degree 8 in depth 3), etc.
Finally, note that depth is not always the right metric to target for efficiency.
First, additions are typically "for free", so we only care about a notion of depth where we measure how many multiplications are performed.
Sometimes we care about more esoteric measures though.
For example in GSW, to compute $p(x)= x^{2^k}$, the "line graph" approach is better than the "full binary tree" approach.
But in most settings (BGV, B/FV, CKKS), low "multiplicative depth" computations are more efficient.