Ok I think I figured out the problem.
Wordpress uses two variables to determine:
- Which url leads to the location of your wordpress site (
WP_HOME
)
- Which url is used to load resources for you wordpress site (
WP_SITEURL
)
The first thing I had to do for the wordpress container was to make sure that these urls match the internal container url, i.e. $BLOG_SERVER
. Because I use docker-compose, it was easy to inject this url using environment variables through the WORDPRESS_CONFIG_EXTRA
argument.
wordpress-blog:
image: wordpress:5
depends_on:
- blog-db
environment:
WORDPRESS_DB_HOST: blog-db
WORDPRESS_DB_NAME: blog
WORDPRESS_DB_USER: wordpress
WORDPRESS_DB_PASSWORD: wordpress
WORDPRESS_CONFIG_EXTRA: |
define('WP_HOME', 'http://wordpress-blog');
define('WP_SITEURL', 'http://wordpress-blog');
volumes:
- wordpress:/var/www/html
Now that that's done, we can now focus on the proxy.
Before I went the route of full reverse proxy, I was under the assumption that the proxy will somehow take over every request for /blog/
and return pages from the proxied site which will look as if they were served directly from wordpress. One thing I didn't account for was that this assumption also assumed server-side rendered pages.
Starting with the new VirtualHost
configuration, this is now what it looks like:
<VirtualHost *:80>
ProxyPass "/blog/" "${BLOG_SERVER}/"
ProxyPass "/" "${REACT_SERVER}/"
<Location "/">
ProxyPreserveHost On
ProxyErrorOverride On
ProxyPassReverse "${DEV_SERVER}/"
</Location>
<Location "/blog/">
ProxyPreserveHost Off
ProxyPassReverse "${BLOG_SERVER}/"
ProxyPassReverseCookiePath "/" "/blog/"
ProxyErrorOverride On
ProxyHTMLEnable On
ProxyHTMLExtended On
ProxyHTMLURLMap "${BLOG_SERVER}/"
SetOutputFilter INFLATE;proxy-html;DEFLATE
# ProxyPassReverseCookieDomain "%{HTTP_HOST:${BLOG_SERVER}}" %{HTTP_HOST}
</Location>
</VirtualHost>
The next thing I had to do for this proxy to start behaving like a proxy was to add this line:
ProxyPreserveHost Off
This ensures that all responses/requests we get from wordpress do not look like they came from us (the proxy). The reason for this will soon be obvious when we start dealing with proxying html.
Next the ProxyPass
directives were moved out of the Location
container, and directly into VirtualHost
.
ProxyPass "/blog/" "${BLOG_SERVER}/"
ProxyPass "/" "${REACT_SERVER}/"
The reason for this is because Location
blocks were being very late in matching the requests, and sometimes the /
path wins over the /blog/
path. I needed it to be more reliable, so I decided to go with specifying the proxies on their own (I saw an example here), then modifying the paths inside the Location
container.
At this point, the reverse proxy is now working! However the html in the pages had links that were pointing to the internal url of the wordpress site. Here is where the mod_proxy_html comes in. It can be used to rewrite all links in html to point to the reverse proxy. Anywhere it finds a link pointing to the internal blog site, the link is replaced with one that uses the reverse proxy.
ProxyHTMLEnable On
ProxyHTMLExtended On
ProxyHTMLURLMap "${BLOG_SERVER}/"
SetOutputFilter INFLATE;proxy-html;DEFLATE
The last line might introduce a bottleneck because it essentially decompresses the payload from the blog site, rewrites all the urls to point to the reverse proxy, then compresses them again. If you don't want this, another way to accomplish it is to use:
RequestHeader unset Accept-Encoding
Even with all this in place, the solution was still not perfect because any javascript file loaded on the page, which makes a request to the internal site will not have their requests routed to the proxy.
One solution to this would be to go with the first solution proposed by the current answer on this question, and changing WP_SITEURL
to point to the reverse proxy directly.
Yet another solution is to use a Service worker to intercept network requests. I like this solution because it does not tightly couple the blog site to the reverse proxy. I could imagine that it wouldn't be too far-fetched (heh) an idea to inject the service worker into any html pages requested from the proxy, and have that service worker intercept all requests which match the internal blog site url, and replace them with the reverse proxy url.
I went with neither of these. After much deliberation, I think hosting wordpress in a sub domain would be better for my needs. Something like blog.example.com is what I might go for, but that would be work for another day.
In conclusion, reverse proxies are difficult to implement properly with apache. I don't know if the grass is greener on the nginx side, but maybe someday we'll check it out. The solution I was going for assumed server-side-only content, which would have proven to be a perfect candidate to be proxied, but alas dynamically loaded content will require more work.
Sources
Apache modules enabled for html proxying
LoadModule deflate_module modules/mod_deflate.so
LoadModule xml2enc_module modules/mod_xml2enc.so
LoadModule proxy_html_module modules/mod_proxy_html.so