- Can a passive listener break this encryption and/or craft legitimate messages?
A passive listener should not be able to reverse AES, so getting the plaintext should be impossible, unless each session is restarted. However, this scheme would be vulnerable to plaintext oracle attacks. Read on.
As for crafting legitimate messages: that's not a passive attack.
- Is this also provide authentication since no one except the key holder can craft such a message?
No. If you perform a man in the middle you can change every bit of the message and leave the magic intact. In CTR mode all bits of the plaintext / ciphertext are independent of each other. It also lets an attacker perform plaintext oracle attacks (by changing a bit of plaintext and then detect to how the system responds to it).
- Is it OK to use IV/nonce with prepending 0 to the Counter? Should I append the counter to a random number (which is also burned at factory)?
As long as the counter never repeats CTR mode is relatively secure - as long as it is used correctly, which is not the case here.
- What to put to the padding, 0 or random numbers? Is it even necessary?
No, for CTR mode padding is not necessary.
- Is using the magic number this way logical? Do I need to generate random magic and send it both unencrypted and encrypted for validation by the receiver?
You'd normally use an authenticated mode to create an authentication tag. 32 bit is usually too small, but depending on the use case it could be enough for real time systems etc.
- What is the correct/well-established method for encryption and authentication in such a setting?
If you just have AES then CCM mode would be the normal mode.
NB
The scheme you are currently describing seems more appropriate for direct AES encryption or AES-CBC. For instance, if you would pad and encrypt the counter you could use it as IV for AES-CBC. If you would put the value 02
in the padding bytes of the message then you would have PKCS#7 compatible padding. The advantage of this is that the bits in the AES block encrypt of the message are now all dependent on each other, hence the magic would work somewhat better.
Your current scheme seems to be limited to one session only. Normally you would derive new session keys for each session.
Even if the mode is changed to CBC, there is a $1 \over 2^{32}$ chance of an attacker creating a message with a valid magic, just by randomly trying. The rest of the message would probably be garbled, but processing garbled text might not be a good idea either. Maybe additional countermeasures could be implemented against that (however, most of those could lead to DoS attacks).