TL;DR - Not possible, in several ways
I had exactly this problem some decades ago in my DSP patent career, when faced with two competitors' implementations of our idea in encrypted FPGAs.
The code, such as it was, was bordering on the trivial, perhaps 10 lines of C equivalent. The problem was that there was an internal state of perhaps 200 bits, and the observable output on each clock cycle was only 3 bits. Fortunately it was possible to control the 40 bits input to the FPGA.
I attacked it as a Hidden Markov model. I can place the year that this occurred, as I recall our company had 486 PCs on many desks, but a few of the managers and directors had Pentiums. I recorded the output of the FPGA in a file, and then on Friday afternoon, toured the company recruiting every PC I could muster to run my code over the weekend, searching the FPGA output for matching sequences of those 3 bits.
On Monday morning, I harvested the matches, and stitched them together, rather as 'shotgun sequencing' would for DNA.
Faced with competitor A's black box, a reasonable match was visible as I got to a 100 length sequence, then improved as I got to 1000. There was no improvement at 10k. By the time I got to 30k, the match was deteriorating. I had to tell my boss that I didn't know what was inside, but that it was not our code.
Black box B was more forthcoming. As I increased the match length, it got better with each decade. I could almost hear the lock tumblers clicking into place in my excitement as I got to 100, then 10k, then 100k. This had our code in it. Or at least, something that behaved identically.
Years later, the truth emerged. Both competitors used an IIR implementation of the basic idea, whereas we had used an FIR implementation. However, A used floating point arithmetic with an LSB or two of added noise. B used integer arithmetic, where poles and zeros can cancel exactly. So both were using our basic idea. Neither was running our precise code.
The moral is clear. Even with a problem as small as this one, it's only possible to say that the black box is not running an identical copy of your code. Whether the version it is running is in any way 'equivalent' to yours is a potentially long and expensive argument for patent lawyers.