Performance is not a generic property of either a cryptographic operation or a type of execution environment. It's a property of a specific implementation in a specific environment. So it is not generically true that “it take[s] more time to decrypt the same file with 256 bit RSA in TEE than in REE”. However there are plausible reasons why this could be the case.
(Background: a TEE (Trusted Execution Environment) is an environment which is isolated from the main operating system (REE: Rich Execution Environment) by a combination of hardware and software means. This can mean running on a different processor, or on the same processor in a different virtual machine, or on the same processor with some isolation technology such as SGX on x86 or TrustZone on arm. Usually the people who tend to use the terms “TEE” and “REE” work with TrustZone or SGX.)
A TEE running on the main processor does have all the computing power of the processor, so if it's running the exact same code, it will go as fast. However, in practice, it's likely that the TEE isn't running the same code as the REE.
SGX and TrustZone protect the TEE against direct attacks, such as REE code attempting to read memory used by the TEE. But it doesn't protect against side channel attacks. For example, the cache is shared between the TEE and the REE; the REE can't read a TEE cache line, but it can observe when the TEE displaces a cache line, so it can conduct cache timing attacks. As a consequence, software running in the TEE must protect itself against cache timing attacks. The TEE is especially vulnerable against timing attacks because the REE operating system has access to fine-grained timers, more so than an unprivileged REE process attacking another process. Defenses against side channel leakage costs performance.
In addition to countermeasures in the TEE's cryptography implementation, the TEE operating system may have countermeasures such as more frequent cache eviction, that also improve security at the detriment of performance.
It's also possible that the TEE implementation of cryptography is simply less well optimized than the REE implementation. The REE is probably running OpenSSL or a fork of OpenSSL, and OpenSSL is where most optimization effort goes outside of processor- and algorithm-specific hand-optimized assembly code. Less manpower tends to go into cryptographic optimization in a TEE.
If you're measuring timing from an REE application, there's a cost to dispatching the operation to the TEE. That cost is typically negligible for asymmetric cryptography though, especially RSA decryption which requires a large amount of computation. For symmetric cryptography, it can be substantial.
You don't say what type of TEE you're looking at. A TEE can be implemented as a separate chip, which is usually a lower-performance one since it's special purpose and therefore has to be cheap. For example, a discrete TPM is a lot slower than a even the cheapest application processor that it can be plugged into, and even dedicated accelerators for asymmetric cryptography don't fully compensate.
P.S. 256-bit RSA was ridiculously weak even in the 1980s. 2048 bits (256 bytes) is a typical RSA key size these days.