According to my knowledge multiplication in a finite field is defined as mul(a,b) = (a*b)%mod, where mod is the irreducible polynomial.
Actually, there are other plausible ways to represent values in a finite field; these other plausible ways yield different ways of multiplication.
However, that's not the case here.
But when I saw the implementation here, It uses bitmasking to multiply
Well, yes, it is effectively implementing $(a \times b) \bmod mod$; it's just doing it in a different way.
Instead of during the full multiplication $a \times b$ and then doing to modular reduction, instead, it scans through the bit representation of $a$, and for each bit $i$ which is set, it does $result = result + 2^i \times a$. The modular reduction effectively happens when they update $2^i \times a$ for the next value of $i$ (which it stores in the variable $v$). The shifting and masking of $v$ is the update of $v$ as $i$ is being incremented.
Oh, and they do the conditional addition in an obscure way - instead of a straightforward "if bit-is-set then z = z ^ v" (addition in $GF(2^k)$ is implemented by xor), then instead play with mask2 to replace z with either z or z^v, depending on the bit setting. This obscure way would make sense if they were interested in constant time, however they're in python, which makes no constant time guarantees, and more importantly, they iterate until they run out of set bits (that is, until f2 is zero), and so the number of iterations they take depends on one of the inputs.
As for why you're not getting the same result, well:
Are you in the same field? You said $GF(2^{256})$, however the python code you cited works in $GF(2^{128})$. Did you convert the python code to work with 256 bits?
Are you using the same irreducible polynomial?
Are you using the same endianness?