It's asked a theoretical approach, thus I'll suppose the question is not about using existing tools as in this answer, which would be off-topic; but rather I'll assume custom code for the password recovery. The general sketch is to test candidate passwords, approximately from most to least likely, by
- Turning the tested password into an 256-bit AES key.
- Testing that key against a portion of the file corresponding to known plaintext.
Notice the large file size is immaterial.
The algorithm for step 1 depends on version of openssl enc used for encryption, and settings used if any. Older versions of openssl enc derive the key using MD5 and EVP_BytesToKey with the iteration count set to 1, which is a criminal mistake from a security standpoint. The hash changed to SHA-256, reportedly at OpenSSL 1.1.0c. And then modern openssl enc can (if option -pbkdf2 or -iter is given) use PBKDF2 algorithm with a default iteration count of 10000 unless otherwise specified by the -iter command line option. Notice that the password derivation is usually salted, in which case the encrypted file starts with 8 bytes 53 61 6c 74 65 64 5f 5f
(Salted__
in ASCII), followed by the 8 bytes of salt which must be supplied to whatever password-to-key derivation function is used. PBKDF2 is not memory-hard and thus obsolete for new designs aiming at being secure, and PBKDF2-HMAC-SHA-256 with 10000 iterations is giving little protection against GPU, FPGA or ASIC-based password crackers, but is still is considerably less unsafe than with the iteration count set to 1 against a CPU-based attack, due to the less small iteration count. The new and old derivations are before and after this else statement (at time of writing).
In step 2, we need to find known plaintext. In the case of a TAR file we have two options
- Every tar file has size multiple of 512 bytes. PKCS#7 padding is used by openssl enc, thus the CBC-encrypted file will have size modulo 512 either of 32 if salt was used, or 16 if not (which we can detect as above); and the last block of the padded plaintext is 16 times the byte 0x10.
- Typically, a tar file has 16 times the byte 0x00 at offsets 80…95 (because that's zero-padding for a file name). These bytes will be at offset 96…111 if salt was used or 80…95 if not (which we can detect as above).
That known plaintext is easily tested since CBC is used: we can decipher with AES-256 (the block cipher) and candidate key the 16-byte ciphertext block that corresponds to the known plaintext block, XOR with the previous block, and compare to the known plaintext block. If there's no match, the key was wrong, thus the password was wrong. False positives are so improbable that they can be ruled out.
For other file formats the recognition of a correct key at step 2 could be more difficult. E.g. tar.gz files can happen to have byte size modulo 16 equal to 15, so that the known padding is a single byte at 0x01 and a test based on that has a 6.2% false positive rate. However, like most common file format, they have some fixed or recognizable bytes in the header, thus a reliable test remains possible.