Well, here is a potential 1-round differential with 4 active sboxes; the differential starts with
+-+-+-+-+
|*| |*| |
+-+-+-+-+
| | | | |
+-+-+-+-+
|*| |*| |
+-+-+-+-+
| | | | |
+-+-+-+-+
(where * designates the active matrix locations; that is, the bytes where the differential is nonzero).
The bytesub doesn't change the differential (other than being the 4 active sboxes)
The shiftrow leaves the differential unchanged (the top row is left alone; the lower active row shifts by two, leaving the same bytes active)
The active columns both see a |X| |X| |
differential; with a branching factor of 3, this may result in another |X| |X| |
differential (and whether it can would depend on the actual pseudo-MDS).
And, the final addroundkey also leaves the diffential alone, resulting in the same differential we started with.
Again, the existence of this differential is consistent with everything you suggested; whether it is actually possible (again) would depend on the mixcolumns pseudo-MDS.
Assuming that the best long term differential in this modified AES is to concatenate this 1-round differential, and we don't have to worry about multiple trails or partial differentials (I don't have a proof of either; the second assumption sounds fishy to me), then we get $4r$ active sboxes after $r$ rounds; 5 rounds would give us 20 active sboxes, which appears to be sufficient.
However, given the uncertainties in the above logic, it would appear to be prudent to assume it is somewhat larger if you are using this as a security assumption.