Score:0

Using apache reverse proxy to send all requests for /blog to internal wordpress server

cn flag

I have a website written in react, and now I wanted to add a blog section to the site. The blog is going to be based on wordpress.

The react app runs in a docker container, and I use the wordpress docker container to run the wordpress blog.

In order to access the website, I use another container running apache and acting as a reverse proxy.

Inside the httpd.conf file for the apache container, I have the following section:

<VirtualHost *:80>
    <Location "/">
        ProxyPreserveHost On
        ProxyPass "${REACT_SERVER}/"
        ProxyPassReverse "${REACT_SERVER}/"
    </Location>

    <Location /blog>
        ProxyPreserveHost On
        ProxyPass "${BLOG_SERVER}/"
        ProxyPassReverse "${BLOG_SERVER}/"
        ProxyPassReverseCookiePath  "/"  "/blog"
    </Location>

    # more config for handling websockets
</VirtualHost>

The variables REACT_SERVER and BLOG_SERVER come from the environment.

The problem I'm having is that when I try to access the blog, apache successfully redirects my request to the internal wordpress site, but when wordpress does its own redirect, it uses the same host as apache, but the path does not start with /blog, so my react app tries to handle the request, but eventually gives up and does its own redirect to the home page.

Here is an example using curl:

➜ curl -v http://localhost:3005/blog/
*   Trying 127.0.0.1:3005...
* Connected to localhost (127.0.0.1) port 3005 (#0)
> GET /blog/ HTTP/1.1
> Host: localhost:3005
> User-Agent: curl/7.74.0
> Accept: */*
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 302 Found
< Date: Fri, 20 Aug 2021 16:27:32 GMT
< Server: Apache/2.4.48 (Debian)
< X-Powered-By: PHP/7.4.22
< Expires: Wed, 11 Jan 1984 05:00:00 GMT
< Cache-Control: no-cache, must-revalidate, max-age=0
< X-Redirect-By: WordPress
< Location: http://localhost:3005/wp-admin/install.php
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host localhost left intact

As you can see, after the X-Redirected-By section, the Location starts with /wp-admin instead of /blog/wp-admin.

From the docs on ProxyPassReverse:

For example, suppose the local server has address http://example.com/; then

ProxyPass         "/mirror/foo/" "http://backend.example.com/"
ProxyPassReverse  "/mirror/foo/" "http://backend.example.com/"
ProxyPassReverseCookieDomain  "backend.example.com" "public.example.com"
ProxyPassReverseCookiePath  "/"  "/mirror/foo/"

will not only cause a local request for the http://example.com/mirror/foo/bar to be internally converted into a proxy request to http://backend.example.com/bar (the functionality which ProxyPass provides here). It also takes care of redirects which the server backend.example.com sends when redirecting http://backend.example.com/bar to http://backend.example.com/quux . Apache httpd adjusts this to http://example.com/mirror/foo/quux before forwarding the HTTP redirect response to the client. Note that the hostname used for constructing the URL is chosen in respect to the setting of the UseCanonicalName directive.

and it seems that this is all that's required for this to work, but it still doesn't.

And if you are wondering, yes I have tried the plain (without the Location directive):

ProxyPass "/blog/" "${BLOG_SERVER}/"
ProxyPassReverse "/blog/" "${BLOG_SERVER}/"
ProxyPassReverseCookiePath  "/"  "/blog"

# etc...

And I also get the same results.

What am I missing?

Score:2
in flag

This issue looks like it's more of a Wordpress thing than a misconfiguration. You need to tell wordpress that it's living inside a subdirectory, because right now the default wordpress .htaccess file is redirecting you to http://localhost:3005/wp-admin/install.php , because it's not aware it's located in a directory called blog.

Option 1. One way to try and solve this is tell wordpress that it has a new base url in the wp-config.php file

define('WP_HOME','http://example.com/blog');
define('WP_SITEURL','http://example.com/blog');

Option 2. Another way to try and handle this, is to edit wordpress's htaccess file

Your htaccess file in wordpress should look something like this

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /blog/
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]

This would be the htaccess that exists inside the wordpress docker container

smac89 avatar
cn flag
Thanks for your answer. Indeed using `WP_HOME` and `WP_SITEURL` would have been one way to solve this, but I wanted to see if it can also be done using just apache, mod_proxy, which is why my [answer](https://serverfault.com/a/1075300/257206) also exists.
Score:0
cn flag

Ok I think I figured out the problem.

Wordpress uses two variables to determine:

  1. Which url leads to the location of your wordpress site (WP_HOME)
  2. Which url is used to load resources for you wordpress site (WP_SITEURL)

The first thing I had to do for the wordpress container was to make sure that these urls match the internal container url, i.e. $BLOG_SERVER. Because I use docker-compose, it was easy to inject this url using environment variables through the WORDPRESS_CONFIG_EXTRA argument.

wordpress-blog:
  image: wordpress:5
  depends_on:
    - blog-db
  environment:
    WORDPRESS_DB_HOST: blog-db
    WORDPRESS_DB_NAME: blog
    WORDPRESS_DB_USER: wordpress
    WORDPRESS_DB_PASSWORD: wordpress
    WORDPRESS_CONFIG_EXTRA: |
      define('WP_HOME', 'http://wordpress-blog');
      define('WP_SITEURL', 'http://wordpress-blog');
  volumes:
    - wordpress:/var/www/html

Now that that's done, we can now focus on the proxy.

Before I went the route of full reverse proxy, I was under the assumption that the proxy will somehow take over every request for /blog/ and return pages from the proxied site which will look as if they were served directly from wordpress. One thing I didn't account for was that this assumption also assumed server-side rendered pages.

Starting with the new VirtualHost configuration, this is now what it looks like:

<VirtualHost *:80>
    ProxyPass "/blog/" "${BLOG_SERVER}/"
    ProxyPass "/" "${REACT_SERVER}/"

    <Location "/">
        ProxyPreserveHost On
        ProxyErrorOverride On
        ProxyPassReverse "${DEV_SERVER}/"
    </Location>

    <Location "/blog/">
        ProxyPreserveHost Off
        ProxyPassReverse "${BLOG_SERVER}/"
        ProxyPassReverseCookiePath  "/"  "/blog/"
        ProxyErrorOverride On

        ProxyHTMLEnable On
        ProxyHTMLExtended On
        ProxyHTMLURLMap "${BLOG_SERVER}/"
        SetOutputFilter INFLATE;proxy-html;DEFLATE
        # ProxyPassReverseCookieDomain "%{HTTP_HOST:${BLOG_SERVER}}" %{HTTP_HOST}
    </Location>
</VirtualHost>

The next thing I had to do for this proxy to start behaving like a proxy was to add this line:

ProxyPreserveHost Off

This ensures that all responses/requests we get from wordpress do not look like they came from us (the proxy). The reason for this will soon be obvious when we start dealing with proxying html.


Next the ProxyPass directives were moved out of the Location container, and directly into VirtualHost.

ProxyPass "/blog/" "${BLOG_SERVER}/"
ProxyPass "/" "${REACT_SERVER}/"

The reason for this is because Location blocks were being very late in matching the requests, and sometimes the / path wins over the /blog/ path. I needed it to be more reliable, so I decided to go with specifying the proxies on their own (I saw an example here), then modifying the paths inside the Location container.


At this point, the reverse proxy is now working! However the html in the pages had links that were pointing to the internal url of the wordpress site. Here is where the mod_proxy_html comes in. It can be used to rewrite all links in html to point to the reverse proxy. Anywhere it finds a link pointing to the internal blog site, the link is replaced with one that uses the reverse proxy.

ProxyHTMLEnable On
ProxyHTMLExtended On
ProxyHTMLURLMap "${BLOG_SERVER}/"
SetOutputFilter INFLATE;proxy-html;DEFLATE

The last line might introduce a bottleneck because it essentially decompresses the payload from the blog site, rewrites all the urls to point to the reverse proxy, then compresses them again. If you don't want this, another way to accomplish it is to use:

RequestHeader    unset  Accept-Encoding

Even with all this in place, the solution was still not perfect because any javascript file loaded on the page, which makes a request to the internal site will not have their requests routed to the proxy.

One solution to this would be to go with the first solution proposed by the current answer on this question, and changing WP_SITEURL to point to the reverse proxy directly.

Yet another solution is to use a Service worker to intercept network requests. I like this solution because it does not tightly couple the blog site to the reverse proxy. I could imagine that it wouldn't be too far-fetched (heh) an idea to inject the service worker into any html pages requested from the proxy, and have that service worker intercept all requests which match the internal blog site url, and replace them with the reverse proxy url.

I went with neither of these. After much deliberation, I think hosting wordpress in a sub domain would be better for my needs. Something like blog.example.com is what I might go for, but that would be work for another day.


In conclusion, reverse proxies are difficult to implement properly with apache. I don't know if the grass is greener on the nginx side, but maybe someday we'll check it out. The solution I was going for assumed server-side-only content, which would have proven to be a perfect candidate to be proxied, but alas dynamically loaded content will require more work.

Sources

Apache modules enabled for html proxying

LoadModule deflate_module modules/mod_deflate.so
LoadModule xml2enc_module modules/mod_xml2enc.so
LoadModule proxy_html_module modules/mod_proxy_html.so
mangohost

Post an answer

Most people don’t grasp that asking a lot of questions unlocks learning and improves interpersonal bonding. In Alison’s studies, for example, though people could accurately recall how many questions had been asked in their conversations, they didn’t intuit the link between questions and liking. Across four studies, in which participants were engaged in conversations themselves or read transcripts of others’ conversations, people tended not to realize that question asking would influence—or had influenced—the level of amity between the conversationalists.