Avoiding SEO pitholes when building Single Page Apps in AWS

Requirements for satisfying web user experience are growing fast and it is becoming increasingly difficult to fulfill those requirements with traditional server side rendered web pages. Users want immediate feedback and fluid transitions. JavaScript Single Page Applications (SPA) can provide just that, highly responsive user interfaces without freezing page loads.

SPAs are quickly replacing traditional server side rendered web application user interfaces. Chart below demonstrates this trend with currently most popular SPA framework AngularJS against traditional server side rendering technologies. Historically popular PHP is in decline too but is left out to keep the chart readable.

Even though SPAs are great for building responsive app like user experiences and provide many benefits over the server side rendered static html sites, they come with few fundamental drawbacks. Biggest issue is that most crawlers and bots can’t render JavaScript content yet, except for Google and Bing to some level.

It’s absolutely crucial for any public site to be accessible for search crawlers and social media platforms. To achieve this reliably crawlers and bots need to be served pre-rendered static HTML of the requested content instead of the SPA. Some latest SPA frameworks like ReactJS and Angular 2 provide means for server side prerendering, but with most others like AngularJS 1.X prerendering has to be implemented manually.

Prerendering

Prerendering means that the SPA is executed with browser like PhantomJS and the outputted static HTML snapshot of the application state is returned to the requester instead of the actual SPA. This way the requester doesn’t need to execute any JavaScript to get fully rendered HTML.

Easiest way to achieve prerendering is to use open source NodeJS application like prerender. You can run it on your own machines or use SAAS offerings like https://prerender.io/.

Now all you need to do is to detect when a bot or a crawler is requesting the page and respond with the prerendered static HTML. ‘User-Agent’ HTTP Request header is the easiest way to detect crawlers and bots. There are ready made configurations available for most popular platforms like this one for Nginx. There is one major problem with User-Agent headers and CloudFront which I will describe in detail below.

CloudFront

Amazon Web Services (AWS) has a great Content Delivery Network (CDN) managed service called Amazon CloudFront. In a nutshell CloudFront is network of distributed edge location servers providing fast access to resources by caching them close to user. CloudFront is a must have for almost any public site running on AWS.

When resource is requested from the service URL the request is routed to the CloudFront edge location. CloudFront replies with cached object if present or passes the request to the origin. Origin is the actual service running in AWS datacenter.

In order for CloudFront to work efficiently it doesn’t cache requests based on User-Agent header.

You can configure CloudFront to cache objects based on values in the User-Agent header, but we don’t recommend it. The User-Agent header has a lot of possible values, and caching based on those values would cause CloudFront to forward significantly more requests to your origin.

This is problematic since the web server running on the origin needs the User-Agent header to detect bots and crawlers.

Unfortunately there is no perfect solution for the issue. The best solution is to configure CloudFront to pass all page load requests to the origin with User-Agent header. User-Agent header is not passed by default so it needs to be configured. This means CloudFront can’t cache any page load requests and the bootstrap index.html is always returned from the origin. Good news is that you can and should have all other resources cached to the CloudFront to minimize the performance hit.

Conclusion

JavaScript Single Page Applications can provide great user experience and are quickly replacing traditional server side rendered web frameworks. Web fundamentals change slowly and only few crawlers and bots can render JavaScript content. Search engine visibility and compatibility with social media platforms is crucial for any public website.

For SPA to be accessible for all the platforms, prerendering needs to happen. Prerendering sets special requirements for the infrastructure configuration.

Amazon CloudFront is powerful content delivery network and should be leveraged for almost every public web service on AWS to guarantee steady performance globally. To enable prerendering with CloudFront some special configurations are required until AWS comes up with something more elegant.

Ideally AWS would add custom cache key header for crawlers and bots as they have already implemented support for different device types:

If you want CloudFront to cache different versions of your objects based on the device a user is using to view your content, we recommend that you configure CloudFront to forward the applicable headers to your custom origin:

  • CloudFront-Is-Desktop-Viewer
  • CloudFront-Is-Mobile-Viewer
  • CloudFront-Is-SmartTV-Viewer
  • CloudFront-Is-Tablet-Viewer
Juho Rautio
Juho Rautio
Senior Consultant, Partner