Score:0

Using gsutil on Windows to download a Google Workspace Data Export

gs flag

What is the easiest approach to automatically download all export files onto a Windows system?

I need to download a full Google Workspace Data Export using Windows. The Google Workspace Data Export is similar to Google Takeout but for the whole organisation.

When the export files are generated they can be downloaded one by one using the web interface or downloaded using a gsutil command supplied by the same web interface.

gsutil -m cp -r \
  "gs://takeout-export-.../20210716T081530Z/CustomerOwnedData/" \
  "gs://takeout-export-.../20210716T081530Z/Resource:\ -10235762353432345231/"
  ...50 more lines
  .

This command does not work out of the box on Windows.

So far I've done the following

  • Removed all \\\n making it a single line statement.
  • Removed the white space escape "\ " inside the filename since it's already quoted.

The problem is still that the filenames in the export contain ":" which Windows doesn't allow.
I can download individual folders by specifying a new target folder name but that has to be done by hand folder by folder.

I've tried to rewrite the command into one command for each folder:

gsutil -m cp -r "gs://takeout-export-.../20210716T081530Z/Resource: -10235762353432345231/" "Resource: -10235762353432345231/"

This works only for folders with only one file inside. Most folders have two files resulting in the following:

CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
CommandException: Destination URL must name a directory, bucket, or bucket
subdirectory for the multiple source form of the cp command.
CommandException: 2 files/objects could not be transferred.

Next I tried to rename the "Resource: ..." folders

gsutil -m mv "gs://takeout-export-.../20210716T081530Z/Resource: -10235762353432345231/" "gs://takeout-export-.../20210716T081530Z/Resource -10235762353432345231/"

But this failed with:

AccessDeniedException: 403 ...@... does not have storage.objects.create access to the Google Cloud Storage object.

I guess I don't have access to modify the Data Export files.

What knowledge do I as an administrator need to know to get access to a Google Workspace Data Export?

Mousumi Roy avatar
us flag
I suspect the issue is that GCS object prefix includes colons and these may cause issues for Windows. [Windows](https://docs.microsoft.com/en-us/windows/win32/fileio/naming-a-file) doesn't allow the name of folders to have some special characters including the ':'. You can rename the bucket folder (object prefix) without the ':' and try again.
Score:1
cn flag
Tom

I've struggled with this as well and went through all the same steps you have. I wish Google would just change their naming protocol to be Windows compatible. If you had your own paid Cloud account you could copy and in doing so rename the forbidden filenames, but you can't do that in a takeout bucket since you can't write anything to it yourself.

My solution ended up being installing a Linux distro via WSL2, downloading with gsutil, renaming the bad folders and then copying into Windows accessible storage.

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.