Score:0

Subsequent migrations take long time to being importing data to destination

cn flag

I've got a pretty large migration set of roughly 200k users. The first time I run the migration (via drush), or after rolling it back and starting again, the rollback+import starts immediately. By this I mean that the progress bar starts showing progress on importing items right away.

I know there's no way around the migration itself taking a long time due to the number of items, but I'm running into an issue on subsequent runs where, before the data actually starts importing into the destination, the migration just sits there, seemingly without doing anything, for an insane amount of time. By subsequent runs, I mean any run of the migration that is not the first run, or the first run after a rollback. So, a migration that is executed either to pull in additional users or is run after the initial one hits an error.

If I add --feedback=x, I do see a console message of Processed 0 items (0 created, 0 updated, 0 failed, 0 ignored) - continuing with 'upgrade_d7_user' every so often, so I know it must be doing something and updating after that number of items, but I don't know what it is. It seems like we are just waiting for it to "take a look" at every item before processing it, which does not happen on an "initial" run, and I'd guess is basically doubling how long the run will take. I guess my questions are:

  • What exactly is the migration even doing at this point? Is it just doing some sort of data verification?
  • Is there some way to bypass this step and just jump directly into processing the data itself? We're already looking and days to run this single migration and this extra time is pretty debilitating.
sonfd avatar
in flag
This is a recurring migration, where subsequent runs should update existing entities with new data from the source?
cn flag
Yes and no. At the moment I'm not concerned with updating existing entities from the source, and am not passing in the update flag. Right now I'm just trying to import entities that do not exist on the destination.
Score:0
in flag

I think what's happening here is that even though the migration has already been run once, each subsequent run will trigger the prepareRow() and hook_migrate_prepare_row() implementations for every row of the migration. Regardless of whether it has already been imported or not. Check out \Drupal\migrate\Plugin\migrate\source\SourcePluginBase::next to see where this happens. Basically, the source plugin runs a query and builds an array of rows to migrate. Then iterates over each one and skips the ones that have already been imported. But not before first calling the prepare row implementations.

When you're migrating Users, or any kind of fieldable entity, the source plugin (\Drupal\user\Plugin\migrate\source\d7\User in this case) likely extends \Drupal\migrate_drupal\Plugin\migrate\source\d7\FieldableEntity and calls the \Drupal\migrate_drupal\Plugin\migrate\source\d7\FieldableEntity::getFields method. See \Drupal\user\Plugin\migrate\source\d7\User::prepareRow for example. This runs a query, once for every row, to see if the source entity has any fields associated with it. And I'm guessing running this query 200k+ times is pretty slow. And is what causes subsequent runs to take a long time.

One way around this is to use a high water mark

What this will do is modify the query the source plugin runs to build the array that it loops over so that it only returns values that are above the high water mark. And the array won't get populated with a bunch of rows that have already been imported. And the subsequent runs should be much faster because it's only processing new rows.

cn flag
Well, we're now done with the migration so I cannot test this out, but what you're describing definitely sounds like the solution so I'm going to mark it accepted. I came across highwater marks a couple of time while researching this but don't think I understood how to implement it. Those links are good resources.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.