Score:1

Storing PDF Subject Data in Media Entities

cn flag

I have pdf documents that I need to create as entities on our website. The pdfs were created with metadata, such as title and subject, that we want to have available in Drupal.

I am trying to save the subject metadata in a new field that I added to the Document media type.

I do not believe that pdf metadata is being stored in either the media object or the file object by default.

I have attempted to hook into the media_creation to save the field by using hook_media_insert() but I have not found an easy way to get that information yet either from the Media or File objects.

Is there a way to grab that metadata information from the pdfs?

id flag
Your expectation is that PDF-specific metadata would be extracted into Drupal fields. Do I have that right? If I do, you would need some kind of library to do so, and this would be probably a duplicate of https://stackoverflow.com/questions/4493189/reading-pdf-metadata-in-php.
Score:1
cn flag

Here is how I was able to achieve what I was looking for. The possible duplicate contained a piece of what I needed; however, the full implementation below addresses a few additional challenges.

use Drupal\file\Entity\File;

function my_module_media_presave(Drupal\Core\Entity\EntityInterface $media) {
  $fid = $media->get($media_field)->target_id;

  if(empty($fid)) {
    return;
  }
  $file = File::load($fid);

  // I set a limit of 3000 in order to mitigate out of memory errors for large PDFS
  $file_contents = file_get_contents($file->getFileUri(), false, null, 0, 3000);
  preg_match('/(?<=Subject)\S(?:(?<=\().+?(?=\))|(?<=\[).+?(?=\]))./', $file_contents, $subject);

  if(!empty($subject)) {
    $media->set('field_description', $subject[0]);
}

Note that this will keep the () around the description. If you don't want those then you can update the regex to do that or parse them out with a string replace.

I sit in a Tesla and translated this thread with Ai:

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.