Score:1

How to migrate files from outside Drupal, with files already where they need to be, creating file entities

pe flag

I'm doing a migration from a non-Drupal site--so far, for test purposes. I have all the data in CSV files.

Since I'm brand new to migrating into Drupal 9 from outside Drupal, I'm learning this in small, somewhat simple phases.

The source data includes a bunch of records, many of which have attached files. Some files are attached to more than one record. Some records have multiple files attached. "Attached" here means a URL is stored, along with a bit of metadata, like a short description, and a type. In the database, this in a join table to relate files to records.

What I want to achieve eventually in Drupal:

All these records migrated in as nodes, the files joined to media and those joined to the correct nodes by entity reference. The creted media entities should have the old meta stuff (description, type) in custom fields.

I gather that the way this should be done is:

  1. Migrate the files into file entities
  2. Using a migration group (migration_plus, migrate_tools, and migrate_source_csv modules) use the same data source and migration_lookup to migrate the media entities
  3. Migrate the nodes in and use the process plugin entity_generate and a value_key of target ID or something to relate the nodes to the right media entities.

The files are already where they need to be on the server, and the paths/URIs are stored in a csv file along with the description and type, a unique ID field and the ID of each related record.

As a starting point, I attempted to import 30 files as a standalone import. The migrate_files module didn't seem like a good fit, mostly because I can't figure out how to adapt it to a situation where the media entities are going to pull field data from a csv... and the file uri's are stored in a csv also.

So I thought I'd try it with mostly standard.

This was my yaml:

uuid: 1bcec3e7-0a49-4473-87a2-6dca09b91abjan-test1
id: fileimptest
label: Test file import
migration_group: default
source:
  plugin: 'csv'
  path: '/srv/imports/filetest1.tab'
  delimiter: "\t"
  enclosure: '"'
  header_offset: null
  ids: [aid]
# not using most of these fields in the file import
# but including because maybe needed for grouping
# and migrate_lookup in the media import?
  fields:
    0:
      name: aid
      label: 'Unique Id'
    1:
      name: title
      label: 'description'
    2:
      name: formflag
      label: 'FormYN'
    3:
      name: newpath
      label: 'path'
    4:
      name: docnum
      label: 'doc number'
    5:
      name: doctype
      label: 'document type'

process:
  uid:
    plugin: default_value
    default_value: 179
  uri: path
destination:
  plugin: entity:file

The result was that my 30 test files appeared in the file list under admin/conent with a status of temporary. The links look correct, but clicking them results in 403 access denied (folder permissions are 777 and owned by the webserver). (I am using private file system and have several files uploaded through normal field widgets: this list with status 'permanent.' The links look the same other than subdirectories but open normally when clicked.)

So questions are:

  • What am I doing wrong so far?
  • Is there a better way? (I'm pretty sure there is, but what?)

(Detail: uid 179 is just a user I created named "importer") I should note that I've read this and this, and lots of examples in the related modules. They have, together, informed what I've come up with so far, to the degree I understand them.)

Edit: "temporary status" just means there are no uses yet, so not important at this point. The only thing that seems wrong with this test import is the access denied issue. The migrate process is missing something necessary to full function of the private file system? Maybe when using private files, they can only be viewed if 'used' on another entity? I haven't found info on this or come up with a way to test it yet.

Edit2: per comments and answer below, the 'temporary' status can be set programmatically during the import, and the access denied is normal under these conditions: when the imported file is both a) not used anywhere and b) clicked by a user other than the uid on the file.

Score:1
cn flag

The temporary status is a field on the file entity (status), you can set that to 1 during import to make it permanent.

process:
  uid:
    plugin: default_value
    default_value: 179
  uri: path
  status: 
    plugin: default_value
    default_value: 1

File usage is a separate (related) thing and shouldn't come into play here. Those references will be added when you associate the files with content etc.

pe flag
Thanks. Do you know if usage impacts the ability to open a file from a link in the files list? I'm probably going to proceed with the files+media migration group test next, so I'll see what the impact is on my 403s. Most just curious right now if the lack of any associated entities would explain why the files all 403... or if I probably have another problem.
cn flag
In a very specific case, which it sounds like you could be triggering, yes - if the file is temporary, has no usage, and the owner of the file is not the current user, then access is denied.
pe flag
I can confirm that this is why I get access denied on these. Logged in as UID 179 and the files are openable. Thanks for your help!
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.