Score:7

How can you prove that a certain file was downloaded from a certain website?

mf flag

Let's say you downloaded a file from a certain website, and later the website claims that it didn't made that file available, is there any way to prove that the website is lying?

Example 1: You download a youtube video and the channel later delete the video and claim that the video was never there.

Example 2: A website post a certain content and later you find that the link is broken, the website doesn't have the content anymore, but you still have the content and want to prove that the content was available there at a specific time.

Any ideas on how to do that?

eagle275 avatar
br flag
I assume having the file in your browser's download history is not enough evidence - although this shows where the file originally was downloaded from. Which brings a small problem for your youtube video question - as far as I know you don't download videos using your browsers functions, instead I always use a certain tool - and if I later change filenames and empty this tool's history and cache, then there's nothing left pointing to youtube nor the channel it was found.
Philipp avatar
cn flag
This isn't a cryptography answer, but the first thing I would try to disprove a claim of "this was never on our website" would be to look for a snapshot on archive.org.
polfosol avatar
in flag
Isn't this question a better fit for the [sister site](https://security.stackexchange.com)?
Dan M. avatar
ro flag
Even if it was and is available, there is generally no way to prove that the specific file was or was not downloaded from specific website. You might have downloaded it and ALSO acquired the same file by some other means. There is no real way to distinguish between them if they are bit identical (metadata such as creation/access time and etc. is not reliable). If anything, it's the claimant who must prove that this file was not available.
Marc Ilunga avatar
tr flag
I think it is possible in TLS 1.3 but in a restricted sense. The reason is that in the full handshake version, the server (and optionally) the client sign the handshake and the traffic encryption keys depend on the entire handshake including the signature. Thus, we can prove that they exchanged with the server. But then, we need that the third party has an independent recording of communication transcript. Otherwise, we can't guarantee that the client didn't forge this packet.
Score:12
ru flag

This is possible, but only with the co-operation of the website. Whatever the downloaded file is, along with the associate metadata such as time and server identity, can be cryptographically hashed to a small but computationally unique value. One can then request the website to sign this hash value with their certified signing key which is guaranteed by the website's certificate. If the website agrees to do this, the association between the download plus metadata and the website identity should be computationally hard to forge.

Note that the non-repudiation property is not a priori guaranteed for downloads secured by https nor sftp. The Applicability Statement (AS) specification includes message signing as an option.

There is a lack of forward security here. If the signing key used by the website (or other signatures in the certification chain) is subsequently compromised (either by future computational ability or cyber-attack) then forgeries will become possible. In these circumstances the website could deny the download by stating that all of their website's past traffic could be forged and no such guarantees should be trusted.

forest avatar
vn flag
Importantly, it would require the _prior cooperation_ of the website. A website cannot cooperate after the fact.
br flag
@forest And if a website expects to ever need to deny that they provided a file, they're not likely to provide that cooperation. Unless they're run by politicians, many of whom do not mind denying saying things when there's public record of them doing so.
cn flag
Prior *and ongoing* cooperation. Simply publishing the signing key would invalidate all non-repudiation guarantees.
Aron avatar
in flag
"There is a lack of forward security here" For a small fee, it could be possible to write the hash onto a blockchain. This will prove that the signing was done before a certain date on said blockchain.
Score:12
vn flag

Daniel S explains what is needed to provide repudiation for file downloads, so I'll explain why you cannot prove a file was downloaded in the typical situation when using regular TLS.

When a server establishes a TLS session with a client, a master secret is exchanged. This secret is used to derive a key that is used for encryption as well as a key used for integrity. To provide integrity, the encrypted data is hashed using an algorithm called a MAC. Unlike most cryptographic hashes, a MAC hash is keyed. Even for the same message, the digest will be different if the key is different.

Before data is transmitted, the MAC is keyed and is used to hash the ciphertext. The hash is then transmitted along with the ciphertext as a tag. The receiver then uses the same MAC key to hash the ciphertext it receives and compares it against the tag. If it differs, the data may have been tampered with and the session will be torn down.

Even though you, as the client, know the master secret and thus also the MAC key, you cannot prove to anyone that the session that you have recorded is legitimate even if you also prove that the master secret itself is legitimate. This is because you could change the data and re-hash it with the MAC without ever calling into question the authenticity of the master secret. At most, you could prove that a given handshake occurred and that it was authentic, but you can't prove what happened afterwards.

See also Does a trace of SSL packets provide a proof of data authenticity?.

Marc Ilunga avatar
tr flag
This gets me thinking, is it still the case that no such statement can be made if we are assuming that we are in a mutually authenticated TLS (1.3) session and that the data is transmitted using a committing encryption scheme?
Marc Ilunga avatar
tr flag
Actually, I am not sure, I full agree with the answer. Or, at least, it is not clear to me that the answer is universally true for all versions of TLS. In some version (and modes) the handshake is signed by the authenticated party(ies). And the key depends on the entire schedule. Therefore, it is not clear to me what prevents a client to prove that the key used to decrypt traffic was the result of a handshake traffic. Can you expand more on that?
forest avatar
vn flag
@MarcIlunga They could prove that the key used to decrypt traffic was the result of the handshake, but because they know the encryption key _and_ the MAC key, they can encrypt arbitrary plaintext with the key and append a valid authentication tag.
Marc Ilunga avatar
tr flag
I agree, the client can indeed create a ciphertext with the content. I think, we can, however, prove a weaker statement. In particular, i the third party have themselves a copy of the communication transcript.
Score:4
cn flag
  1. You can have witnesses watch you download the file to attest to the fact. I'm not sure how many witnesses you would need, but I'm sure the more the merrier.

  2. On the day that you download the file, when it is available, you can have archive.org (or a similar archiving site) make an archive of it (and the page that offers it). I'm not sure how well it would hold up in court, but archive.org has no reason to falsify its results for random people. This only works if you don't have to log in to download the file. (Woops, I guess someone already mentioned this.)

It could be good to make a detached signature for the file to prove when you downloaded it by. This won't tell where you downloaded it from, but knowing when you downloaded it could be useful to show that the company is lying in some cases. However, maybe you could set your computer to a false time to falsify this; so, you might need witnesses to give their signatures, too.

Better yet, if you can make the company itself your witness, then, yeah. On the day you download the file, maybe email them about it and ask questions (if they respond and talk about it, then there you go).

dave_thompson_085 avatar
cn flag
You can get a timestamped commitment to a hash from many public TSAs (TimeStamping Authority) which (given a secure hash) proves you had content no later than the TSA-affirmed time. (But you could have had it earlier, and waited before submitting to the TSA.) Or nowadays you can do it yourself on any of several public blockchains -- if you pay, and estimate (aka guess) correctly which one(s) will survive and not get broken and discredited.
Score:3
id flag

Two variants of an idea. First, submit a request to Internet Archive/Wayback Machine to have them mirror the page in question. Note, however, that a particular site might not allow them to mirror it or might retroactively request the page to be taken down.

Second, ask an authoritative third party (notary, lawyer) to perform the download--ideally on a system you do not control--and record the result for posterity.

Taking a screenshot is not unusual (and then print and mail it to yourself but do not open), but offers only epsilon evidence over your unsubstantiated claim for those who know better.

These are obviously proactive steps. For the specific instance of YouTube, I imagine you can file a lawsuit and then have the judge issue discovery against Google who will know the truth. (IANAL).

fraxinus avatar
sa flag
At least in one case, our boss did exactly what you describe as a second option. He went to a certified print of a particular information published on a particular website. Public procurement in some countries is tricky...
forest avatar
vn flag
This might help prove to a person that a file existed on a certain site, but it's not cryptographic in nature. There is no cryptographic guarantee that the Wayback Machine is serving correct and accurate content.
Score:1
cn flag

Since you asked this on Cryptography, not Law, the answer is simple: you cannot. The TLS protocol (arguably very intentionally at this point, although originally it may have been unintentional) does not provide non-repudiation. This is very different from some other cryptographic communication contexts like DKIM on email that does serve as cryptographic proof of where the content originated from, which ends up being a perpetual source of liability in breaches.

This answer on sister site Security goes into some of the details.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.