Make webserver to prevent parsing of certain HTML elements

technology-liker

7/4/23, 6:05 PM

MediaWiki content management system creates many links which their webpages I want not to be discovered by search engine crawlers.

It's not only that I don't want them indexed and more so not only that I don't want them crawled, but I don't even want them discovered !

In theory I can try to customize the skin (theme/template) of my MediaWiki website to remove the HTML elements linking to these webpages but doing so sanely requires tremendous learning of the MediaWiki architecture which I'd prefer not to do if more simple solutions are available.

CSS display: none won't help as the markup would be evident in DOM
JavaScript document.querySelector("#x").remove(); won't help as until it runs, crawlers may discover the link element
I cannot use PHP 8.1.3 to ignore its own previous commands because the moment any markup with such link was processed, it would be served to the user.
I can use robots.txt to try to prevent crawling (if not indexing) of these page though, but, since my website URLs are multilingual and there are many patterns, this might be a hard task.

The only trick which might left to help me is to somehow ask the server to not serve any such markup by CSS ID or class.

As brute as it may be, can it work? If not, what other option do I have left?

0 + 0

php

robots.txt

mediawiki

javascript

css

Mat

7/4/23, 6:20 PM

If you don't want stuff discovered, don't put it on the public web. Keep your private stuff behind required authentication.

Tero Kilkanen

7/5/23, 11:07 AM

If MediaWiki does not support your requirements, you should look into other software for the purpose that supports the requirements. That is the only reasonable and maintainable way to reach your objectives. All other methods require lots of effort and can have many undesired side effects.

technology-liker

7/5/23, 11:59 AM

@TeroKilkanen I strongly agree, I would migrate to Drupal but it's already 2400 webpages and manually transfer content could take about 4 months and would be hard and I also like MediaWiki syntax a lot.

technology-liker

7/5/23, 11:59 AM

I can use **robots.txt** to try to prevent crawling (if not indexing) of these page though, but, since my website URLs are multilingual and there are many patterns, this might be a hard task. Still, much easier than migrating to Drupal.

Elon Musk

I sit in a Tesla and translated this thread with Ai:

EN: Make webserver to prevent parsing of certain HTML elements

TH: สร้างเว็บเซิร์ฟเวอร์เพื่อป้องกันการแยกวิเคราะห์องค์ประกอบ HTML บางอย่าง

RO: Faceți un server web pentru a preveni analizarea anumitor elemente HTML

RU: Сделать веб-сервер, чтобы предотвратить синтаксический анализ определенных элементов HTML

VI: Tạo máy chủ web để ngăn phân tích cú pháp các phần tử HTML nhất định

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.