Score:3

How to re-index all content for the core search engine?

br flag

Is there a way to re-index all content for the core search engine?

In older versions, you could reindex with drush:

drush search-reindex --immediate --verbose

However, this does not appear to be an option anymore. In more recent versions of drush, one can do this:

drush search-api-reindex

However, that command is for the Search API module, and not for the core search index.

How do I re-index the core search index in bulk? Cron will only index up to 500 items at a time, but I have tens of thousands of items to index.

Score:2
cn flag

You can use the search index service directly

drush ev "\Drupal::service('search.index')->markForReindex();"

and then run drush cron as often as you need to.

In a Bash loop for example:

for (( c=1; c<=5; c++ )); do drush cron; done
br flag
This marks the items for re-indexing, but I don't think it re-indexes them. Is this any different than clicking the "Re-index site" button at admin/config/search/pages ?
4uk4 avatar
cn flag
This answer is about a scripting solution which replaces the no longer available single drush command with multiple ones to re-index in bulk. It's not about UI. Additionally the immediate option to run this in a single batch had its limitations. Even when the old command was still available re-indexing should have been run in batches to avoid time-outs and memory limits.
br flag
If I run this markForReindex() function, and then run cron once, will this result in all content being indexed? It does not appear so. It appears to have the same effect as clicking the "Re-index site" button at admin/config/search/pages and then running cron. In other words, it will re-index one batch -- not all the content.
br flag
The problem with batches is that it will take more than two weeks to index all the content at this pace. There are tens of thousands of items to index.
Kevin avatar
in flag
Why would it take weeks to index? I have a site with 120,000 items and it only takes a few hours (due to it also extracting and indexing PDF attachments) via Drush. I would just switch to Search API / Search API Database, its a million times better than the core search module.
br flag
I believe I have to use the core search module because it is a dependency for the Views "Search Keywords" filter. Separately, I am using Search API for the site's main search engine. But I have a Views filter for which I need the core search function. Why would it take weeks? I don't know. What I'm seeing is if I run cron, it goes from 13% complete to 14% complete, etc. And why is running cron the only way to re-index? Or is it? What if I don't want other cron-initiated processes to run?
4uk4 avatar
cn flag
You don't have to wait for weeks, you can run `drush cron` in a loop back to back. Then it doesn't matter what batch size you have configured, the overhead for each batch run won't make much of a difference. If you don't want to run other cron-initiated processes you could invoke the `search_cron()` hook directly.
4uk4 avatar
cn flag
... the drawback, you could run into problems with concurrent cron runs. So better use `drush cron`.
br flag
Thanks. I ran `drush cron`. It appears that `search_cron()` takes a very long time. In the terminal the output gets to `"Starting execution of search_cron()"` and just hangs. I don't know if it's doing anything or not, but if I refresh `admin/config/search/pages` every few minutes, the number of items indexed isn't changing. The cron run never seems to complete. ps -- Actually I just saw a change at `admin/config/search/pages`, but just one item is added to the index every several minutes.
4uk4 avatar
cn flag
In a normal Drupal install you can index 500 items at a time. If the rendering of the to be indexed nodes is more time consuming than usual you need to reduce the number of items until the script ends in a reasonable time. See INDEXING THROTTLE on admin/config/search/pages
br flag
I was missing indices in the search_index database table, which was dramatically slowing down re-indexing. That's now fixed and the cron runs can now complete. It takes about 2 minutes per cron run if I set the Indexing Throttle to 500. So it's actually not impractical in my case to run cron lots of times until it's done.
4uk4 avatar
cn flag
Use a loop, I add an example.
Score:0
gb flag

Here is some opinioned solution with automation based on @4uk4 answer.

remaining=$(drush ev "echo \Drupal::database()->query('SELECT COUNT(DISTINCT [n].[nid]) FROM {node} [n] LEFT JOIN {search_dataset} [sd] ON [sd].[sid] = [n].[nid] AND [sd].[type] = :type WHERE [sd].[sid] IS NULL OR [sd].[reindex] <> 0', [':type' => 'node'])->fetchField();"); \
size="$(drush cget search.settings index.cron_limit --format=string)"; \
iterations=$(( remaining/$size )); \
for (( c=1; c<=$iterations; c++ )); do printf "work on iteration $c/$iterations\n"; drush cron; done

In this "one line" script we get remaining items for indexing and calculate number of iterations. Then just running cron.

Note: remaining items query from the above is for nodes only.

BTW: you can control "Number of items to index per cron run" by drush config:set command.

Note: This script works well, but you will have to change this piece of code: [':type' => 'node'] to [':type' => 'node_search'] for Drupal 9.x

mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.