Score:0

Reading only the metadata of a file in a Google Cloud Storage bucket into a Cloud Function in Python (without loading the file or its data!)

mx flag

I need something like Cloud Storage for Firebase: download metadata of all files, just not in Angular but in Python and just for a chosen file instead.

The aim is to return this information when the Cloud Function finishes with the return statement or just to log it during the run of the Cloud Function as soon as the file is saved in the Google Storage bucket. With that information at hand, another job can be started after the given timestamp. The pipeline is synchronous.

I have found Q/A's on loading a file or its data into the Cloud Function

to extract data stats into the running Cloud Function from the external file.

Since I do not want to save the large file or its data in memory at any time only to get some metadata, I want to download only the metadata from that file that is stored in a bucket in Google Storage, meaning timestamp and size.

How can I fetch only the metadata of a csv file in a Google Cloud Storage bucket to the Google Cloud Function?

Zeenath S N avatar
cn flag
Hey, have you tried some code? If you did then please provide it. Also let us know the error you are receiving in the code.
questionto42standswithUkraine avatar
mx flag
@ZeenathSN Hey, good question, I have postponed that up to now. For now, I take a workaround in Python: 1. `datetime.now()` 2. counted written rows 3. number of field_names as the column count. I put that in the logging and in the return statement. I have not yet tested anything similar that I would perhaps get as metadata from Google Cloud Storage. For now, I will not take the time to go further unless I get an answer here.
Zeenath S N avatar
cn flag
Perhaps, this [GitHub Link](https://github.com/googleapis/python-storage/blob/main/samples/snippets/storage_get_bucket_metadata.py) might help you?
questionto42standswithUkraine avatar
mx flag
@ZeenathSN Thanks, no, it loads the file from the bucket into the memory of the GCF container (if I am not mistaken, please correct me elsewise), see `bucket = storage_client.get_bucket(bucket_name)`. Doing that would be useless traffic since the file that I deal with is large. I save it directly to GCS to avoid having it in memory in the GCF container. Then I do not want to load the whole file only to catch its metadata.
questionto42standswithUkraine avatar
@ZeenathSN Just realise that above, I made a mistake that the bucket itself is not the file anyway, so that the whole code does give metadata, yes, but not for a chosen file. Therefore the github code does not solve it but goes into the right direction.
Score:1
cn flag

There is a Google document present, that shows how to get metadata which is similar to the GitHub Link that I had provided in the comment. You can look at the library here

It just gets the metadata and doesn’t retrieve object data until you call download_to_filename()

Else, you can have a look at the API:get documentation where it shows that it only retrieves the metadata if alt = media isn’t specified and try it.

questionto42standswithUkraine avatar
mx flag
You are right, `blob = bucket.get_blob(blob_name)` does probably not yet upload the file to the cloud function since it is used for the metadata example query: [View and edit object metadata](https://cloud.google.com/storage/docs/viewing-editing-metadata#view) --> "Code Samples" --> "Python". I had overseen these code samples, I thought that I would have to use [`gsutil` which is not available in a GCF](https://stackoverflow.com/questions/61795056/run-a-gsutil-command-in-a-google-cloud-function). My mistake.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.