Since content types don't allow a specific URL structure
Each content (node) of any type allows to configure an URL alias. The URL alias may include sub paths. If an URL alias has been configured, Drupal will redirect the generic node/*
URLs to the configured alias. So you can actually use a robots.txt
file to exclude certain paths from crawling.
Using the pathauto module, you can even generate those human-readable and search engine friendly URL aliases automatically. It allows you to define different patterns for different content types. This way, all your "hidden" content could automatically be placed on a certain URI path.
Other approaches:
In general, a search engine will index all pages found on your site, if not told otherwise. Most major search engines as Google, Bing, Yahoo or Yandex will obey the noindex
directive. As of writing, of all top 5 search engines used worldwide, only the Chinese Baidu search engine still ignores noindex
and requires a robots.txt
with Disallow
directive instead.
noindex
can be given
- in a HTTP response header
X-Robots-Tag: noindex
- or as meta tag
<meta name="robots" content="noindex">
.
Meta tags:
To configure an according meta tag, you can use the metatag module which allows you to define meta tag presets per content type. (Whether for your use case or in general, I highly recommend using the metatag module for any of your Drupal sites that are publicly accessible and should be optimized for search engines or content sharing.)
Response headers:
If you (additionally?) want to go the response header way, you could create a custom module (e.g., my_noindex_module
) with a response event subscriber and add the header whenever a response with a node of your type is delivered:
src/EventSubscriber/MyNoindexResponseSubscriber.php:
<?php
namespace Drupal\my_noindex_module\EventSubscriber;
use Drupal\Core\Routing\RouteMatchInterface;
use Drupal\node\NodeInterface;
use Symfony\Component\HttpKernel\Event\FilterResponseEvent;
use Symfony\Component\HttpKernel\KernelEvents;
use Symfony\Component\EventDispatcher\EventSubscriberInterface;
/**
* Response subscriber to add noindex HTTP headers to responses.
*
* Adds the noindex header to responses that were created for routes with
* a `'my_noindex_type'` node as parameter.
*/
class MyNoindexResponseSubscriber implements EventSubscriberInterface {
/**
* The current route match service.
*
* @var \Drupal\Core\Routing\RouteMatchInterface
*/
protected $routeMatch;
/**
* Constructs a new noindex response subscriber.
*
* @param \Drupal\Core\Routing\RouteMatchInterface $route_match
* The current route match service.
*/
public function __construct(RouteMatchInterface $route_match) {
$this->routeMatch = $route_match;
}
/**
* Sets extra HTTP headers for custom node type.
*/
public function onRespose(FilterResponseEvent $event) {
if (!$event->isMasterRequest()) {
return;
}
// Get node from route parameters.
$node = $this->routeMatch->getParameter('node');
// Whether we have no node or the node type doesn't
// match our `'my_noindex_type'` type.
// You may have to adjust the node type here.
if (
!$node instanceof NodeInterface
|| $node->getType() !== 'my_noindex_type'
) {
return;
}
$response = $event->getResponse();
$response->headers->set('X-Robots-Tag', 'noindex');
}
/**
* {@inheritdoc}
*/
public static function getSubscribedEvents() {
$events[KernelEvents::RESPONSE][] = ['onResponse', -100];
return $events;
}
}
services.yml:
services:
my_noindex_module.noindex_response_subscriber:
class: 'Drupal\my_noindex_module\EventSubscriber\MyNoindexResponseSubscriber'
arguments: ['@current_route_match']
tags:
- { name: event_subscriber }