SPAs are quickly replacing traditional server side rendered web application user interfaces. Chart below demonstrates this trend with currently most popular SPA framework AngularJS against traditional server side rendering technologies. Historically popular PHP is in decline too but is left out to keep the chart readable.
It’s absolutely crucial for any public site to be accessible for search crawlers and social media platforms. To achieve this reliably crawlers and bots need to be served pre-rendered static HTML of the requested content instead of the SPA. Some latest SPA frameworks like ReactJS and Angular 2 provide means for server side prerendering, but with most others like AngularJS 1.X prerendering has to be implemented manually.
Now all you need to do is to detect when a bot or a crawler is requesting the page and respond with the prerendered static HTML. ‘User-Agent’ HTTP Request header is the easiest way to detect crawlers and bots. There are ready made configurations available for most popular platforms like this one for Nginx. There is one major problem with User-Agent headers and CloudFront which I will describe in detail below.
Amazon Web Services (AWS) has a great Content Delivery Network (CDN) managed service called Amazon CloudFront. In a nutshell CloudFront is network of distributed edge location servers providing fast access to resources by caching them close to user. CloudFront is a must have for almost any public site running on AWS.
When resource is requested from the service URL the request is routed to the CloudFront edge location. CloudFront replies with cached object if present or passes the request to the origin. Origin is the actual service running in AWS datacenter.
In order for CloudFront to work efficiently it doesn’t cache requests based on User-Agent header.
You can configure CloudFront to cache objects based on values in the User-Agent header, but we don’t recommend it. The User-Agent header has a lot of possible values, and caching based on those values would cause CloudFront to forward significantly more requests to your origin.
This is problematic since the web server running on the origin needs the User-Agent header to detect bots and crawlers.
Unfortunately there is no perfect solution for the issue. The best solution is to configure CloudFront to pass all page load requests to the origin with User-Agent header. User-Agent header is not passed by default so it needs to be configured. This means CloudFront can’t cache any page load requests and the bootstrap index.html is always returned from the origin. Good news is that you can and should have all other resources cached to the CloudFront to minimize the performance hit.
For SPA to be accessible for all the platforms, prerendering needs to happen. Prerendering sets special requirements for the infrastructure configuration.
Amazon CloudFront is powerful content delivery network and should be leveraged for almost every public web service on AWS to guarantee steady performance globally. To enable prerendering with CloudFront some special configurations are required until AWS comes up with something more elegant.
Ideally AWS would add custom cache key header for crawlers and bots as they have already implemented support for different device types:
If you want CloudFront to cache different versions of your objects based on the device a user is using to view your content, we recommend that you configure CloudFront to forward the applicable headers to your custom origin: