Score:-1

Can't get page html after updating the page and flushing all caches

gb flag

Once I updated a page via admin panel, I need to programatically flush all the caches, then get the page source code and write it to a file. I use the following code inside of my module:

function mymodulename_node_update($node) {
    drupal_flush_all_caches();
    $nid = $node->nid->value;
    $nodePath = \Drupal::service('path.alias_manager')->getAliasByPath('/node/'.$nid);
    $content = file_get_contents('https://mydomain.com'.$nodePath);
    file_put_contents(__DIR__ . '/test.html', $content);
}

But after a long time (longer that the time for normal flushing the caches via admin panel) I just receive the error

"file_get_contents(https://mydomain.com/path-to-page): failed to open stream: HTTP request failed! in..."

and file writing is not executed.

If I try to get the content via cURL, the admin panel page never finishes loading.

What's wrong?

leymannx avatar
ne flag
Isn't it just `$node->id()`?
leymannx avatar
ne flag
Do you know [Tome](https://www.drupal.org/project/tome)? It's a static site generator for Drupal.
stckvrw avatar
gb flag
@leymannx I receive the correct nid with existing method. Anyway `->id()` still doesn't resolve the problem
leymannx avatar
ne flag
Maybe some php.ini: https://stackoverflow.com/a/3488430/2199525
stckvrw avatar
gb flag
No, if I remove the line `drupal_flush_all_caches()` the code works correctly
stckvrw avatar
gb flag
I even tried to add `sleep(10)` after the flushing and before getting the content, but without success
leymannx avatar
ne flag
Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/128459/discussion-between-leymannx-and-stckvrw).
Score:2
cn flag

This is not a valid use case for drupal_flush_all_caches(). This function is for changed or new code. For content you have to use cache tags.

In the rare case, where it is not possible to add correct cache tags to all render arrays, you can invalidate the tag rendered, which is added by default, even if no cache tags are specified:

\Drupal\Core\Cache\Cache::invalidateTags(['rendered']);

BTW entity operations are handled in transactions, so concurrent page request might not see the changes until they are committed to the database. In this case a cache clear (in any form) doesn't help.

Edit: Adding an example for the latest comment.

A nice solution would be a queue worker. It runs in the background, without that you have to wait for the admin panel to respond after a node save.

Example:

In the hook add a queue item with the entity ID:

Media::postSave()

\Drupal::queue('media_entity_thumbnail')->createItem(['id' => $translation->id()]);

Which is adding a queue item for this queue worker plugin:

/modules/media/src/Plugin/QueueWorker/ThumbnailDownloader.php

<?php

namespace Drupal\media\Plugin\QueueWorker;

use Drupal\Core\Entity\EntityTypeManagerInterface;
use Drupal\Core\Plugin\ContainerFactoryPluginInterface;
use Drupal\Core\Queue\QueueWorkerBase;
use Symfony\Component\DependencyInjection\ContainerInterface;

/**
 * Process a queue of media items to fetch their thumbnails.
 *
 * @QueueWorker(
 *   id = "media_entity_thumbnail",
 *   title = @Translation("Thumbnail downloader"),
 *   cron = {"time" = 60}
 * )
 */
class ThumbnailDownloader extends QueueWorkerBase implements ContainerFactoryPluginInterface {

  /**
   * The entity type manager service.
   *
   * @var \Drupal\Core\Entity\EntityTypeManagerInterface
   */
  protected $entityTypeManager;

  /**
   * Constructs a new class instance.
   *
   * @param array $configuration
   *   A configuration array containing information about the plugin instance.
   * @param string $plugin_id
   *   The plugin_id for the plugin instance.
   * @param mixed $plugin_definition
   *   The plugin implementation definition.
   * @param \Drupal\Core\Entity\EntityTypeManagerInterface $entity_type_manager
   *   Entity type manager service.
   */
  public function __construct(array $configuration, $plugin_id, $plugin_definition, EntityTypeManagerInterface $entity_type_manager) {
    parent::__construct($configuration, $plugin_id, $plugin_definition);
    $this->entityTypeManager = $entity_type_manager;
  }

  /**
   * {@inheritdoc}
   */
  public static function create(ContainerInterface $container, array $configuration, $plugin_id, $plugin_definition) {
    return new static(
      $configuration,
      $plugin_id,
      $plugin_definition,
      $container->get('entity_type.manager')
    );
  }

  /**
   * {@inheritdoc}
   */
  public function processItem($data) {
    /** @var \Drupal\media\Entity\Media $media */
    if ($media = $this->entityTypeManager->getStorage('media')->load($data['id'])) {
      $media->updateQueuedThumbnail();
      $media->save();
    }
  }

}

By default cron runs only every 3 hours. If you need the static HTML in a shorter time then trigger the cron task (which also runs the queues) from outside the website. See https://www.drupal.org/docs/user_guide/en/security-cron.html

stckvrw avatar
gb flag
Thanks. Flushing the caches and generating static page are for different purposes - we just use both at the updating of a page. We encountered some bugs of our website appeared after updating a page via admin panel. For example content of some page might disappeared or the index.php was appeared in menu links. And flushing the caches resolves such bugs.
4uk4 avatar
cn flag
You get index.php in cached links if someone visits the page with a URL containing index.php. To prevent that you can redirect such traffic to clean URLs with https://www.drupal.org/project/redirect
stckvrw avatar
gb flag
With invalidating the tag now my code works without the error. But as you mentioned, it doesn't see the changes. How to resolve this problem? If I add `sleep(10)` before getting the content, I again receive the same past error
4uk4 avatar
cn flag
A nice solution would be a queue worker. It runs in the background, without that you have to wait for the admin panel to respond after a node save. And you don't have to wait for hours to get the static html, it's no problem to run cron every 2 or 3 minutes.
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.