Technical

SEO Friendly React.js App Architecture for Multimedia Content

by on 9th March 2016

There’s an issue with modern content creation for the web: non-technical users want to be able to create mixed media content, with the option of embedding custom components where needed. However, traditional CMS tools don’t have the middleware to support that idea.

We decided to sit down and create a prototype middleware in React.js to allow for that kind of content creation and publication, which would be search-friendly.

If you’re dealing with Angular/React/other JS framework projects handling multimedia content, this may save you some headaches.

Note: this post is mostly aimed at developers who need to build JS applications which are search friendly. If you’re not a developer but you’re interested in the outcome of the project, take a look at my work in progress:

cover image

Framing the Problem

As you may know, we do a lot of research and development work around understanding the search implications of emerging technologies, such as SEO for JavaScript-based projects, React.js based SEO, Accelerated Mobile Pages, Instant Articles and so on. As part of that, I’ve recently been working on a concept for creating long form article content more easily, capable of utilising both standardised and custom components.

Increasingly, developers want to build interfaces to systems in JavaScript only. The appeal is obvious – greater power and flexibility in what can be offered, reduced development time, and with React Native, in particular, an easy way to create a web and app-based front end at the same time. The downsides, however, are obvious: whilst the Google engineering team is getting better at rendering JavaScript, they’re still a long way from perfect.

To address this we came up with an idea: what if you could provide a data source which contained all the information about what should be rendered, without HTML, and the system could sort out what the output should be? A user would be able to use HTML or markdown if they wanted, but equally if they wanted an image gallery or a video embed, they could simply call for that and everything after that would be handled for them. Similarly, when it came to the application’s output, the search considerations of rendering JS could be handled automatically for them, meaning they wouldn’t have to worry about those issues.

The implications for the speed of execution are exciting, as are the implications for providing a platform for less technical marketers to create exceptional content.

Inspiration from Prior Art

What we’re describing here isn’t too dissimilar in concept to Ghost. If you’re not familiar with it, it’s a JavaScript based blogging platform built on Node.js. What it doesn’t do though is allow for arbitrary content modules to be added in. Instead, it simply allows for content in the form of Markdown, with an admin area similar to WordPress or any other blogging platform.

On the other end of the scale, I recently came across a newsroom CMS which does away with the idea of a page, and instead uses blocks of content. When a journalist is creating or editing a piece, they simply add in another block, which could be an image, text, or video, or something else entirely. However, the system in question used a lot of different technologies to achieve this, and only rendered in HTML, requiring custom JavaScript to be called in, leading to pages which become very heavy very fast.

What if you could combine those ideas? Something entirely based in JavaScript, using modules to compose content? That’s what we set out to create.

Technology Choices

In terms of front end tooling, I’m most familiar with React, so that’s what our solution is built with. Also, in the interests of making everything as simple as possible, our demo application would use a single flat file with JSON. But how to make the actual middleware work?

First Principles

The most basic element of a site is its data structure, which gives rise to its architecture. As a result, the most base unit of data is the URL, so we decided that the first level in the JSON object would be the post slug. This gave a very simple way of pulling the correct data for any page – simply get the object associated with that key. It’s also a concept that’s easy to map to larger data storage methods.

With that decided, object properties would be the most basic things we’d require at that point: a title, a description, and an array of components relevant to that URL. The array would hold a set of objects, each containing the name of the component to be used and the props for that component.

For example, you might get an object like this in the components array:

What this would do is fairly obvious, but for the avoidance of doubt, here’s an implementation of something that could work with that component…

Note: ES6, because Babel exists. Also, all the object properties are named as the names of the React component properties, for ease of understanding.

We could now simply create a tiny factory capable of rendering any given component, given the required data, and then iterate through our list of components and output a page of content. Anything from a YouTube video to a custom component specific to that page could be called in and rendered, and due to the nature of React, only what’s actually required would need to be called.

That might look something like:

Application SEO Considerations

Of course, we want our applications to be search-friendly. Fortunately, with React, we can take our entire object output and render it server-side with the renderToString call. For other frameworks we could use something like Prerender.io.

Component SEO Considerations

Because of the nature of a modern JS-based front end, you can often have components which don’t render all of their content. For example, were an older jQuery based tabbing solution would have had all of the content in place all the time, and shown or hidden content using CSS to display the appropriate piece, something like React will often allow you to simply create or destroy elements as required.

As a result, it’s worth consulting with the person commissioning the content as to exactly what needs to be indexable. Obviously, components which alter what is displayed by directly changing the output can cause issues from an indexation standpoint. Make sure you’re all clear on what’s expected of the application in terms of its rendering to avoid building the application twice.

Data Considerations

Obviously for something larger than a proof of concept you’d want more than a single JSON file. Any database will suffice for this, but something that’s designed to store either flat file documents or JSON/similar in the first place would be the obvious first choices. Beyond that, you’d need to consider routeing for dynamic pages, such as if you wanted a list of content items to be pulled in, that would also probably warrant archive pages. However, these would also be buildable from composable components, so you could still use this approach, but include a pageType attribute at the top level of the URL routeing for the application, and feed data in.

This reduces the concept of the application to a blob of state at the top, which should flow down through the application like a good React app should. State can be held as high up in the system as it needs to be, so all child components can know what they’re supposed to be rendering. Flowing in new data at any point simply updates state at the top, moving down through the app to produce new output. The new data would simply change what components were called and in what order, with different data being given to them.

Demo

To see a demo of this concept in action, have a look at Fly me to the Moon. For a page which uses both re-used and single use components, take particular note of the Man on the Moon page.

Github repo: https://github.com/Builtvisible/fly-me-to-the-moon

Like this? Sign up to read more

Responses

  1. Pete, thanks for sharing this. I have a few questions that have come up with a recent client, and your brain seems to be the one to pick. ;-) There are dynamic JS content elements on the page (related video carousels), which they do NOT want Google to render because the content includes a description of a video that also appears on the video page (not unlike putting the entire product description on a JS-driven category page in the eCommerce world). My advice was that you neither want to rely on JavaScript to display content you want index, nor to hide content you do NOT want indexed. Theoretically the rest of the content can be pre-rendered (if the entire page is JavaScript based) for search engines, but not the carousel. However, Google is already telling webmasters not to use a pre-rendering service, and rendering a different experience for them would, in my opinion, considered cloaking.

    I’m inclined to tell them to either include it or not include it, but pre-render the entire page – including that element – and leave it at that. If you’re worried about duplicate content, just don’t put it on the page. Use an alternate description, or no description at all.

    So my question is: What Would Pete Do?

  2. Hey Everett,

    Tricky one. My short answer would be just render it without the video carousel, and lazy-load it in when the user gets to it, with the actual source being swapped out as the user navigates through the items. That way, you only ever have one description, rather than all of them. It’s unlikely for the moment that Google will pick up the text at all (as it’s not loaded until reaching that item), and when they do get good enough to crawl it, you’ll only be displaying one of the snippets and not all of them.

    To take that one step further, I’d look at having a secondary description only used in carousels. So a short one for snappy text where you need that sort of copy, like related media links, and a full description for when the item itself is viewed. Again, that means that the only snippet that will be duplicated is one that doesn’t reside on the individual item page.

    Hope that helps!

  3. Yes, that definitely helps Pete. We had already recommended using a shorter description for the carousel, but I hadn’t thought about the lazy-loading aspect.

    Cheers!

    Everett

  4. Hi Pete,

    based on your sample app on github (fly-me-to-the-moon), I’m interested to learn more about how react can handle “content page”, you made me discover the “renderToString”, quite interesting, because here we are looking to redesign a highly dynamic website but with his part of content page, and I would like to see if React can handle this too.

    My concern is that we have a multi-language site, with a geoloc concept, both can make change the content of the page for a specific url.

    (For example: /buyer => if in session language is set to EN or FR, and if we are in Canada, the content won’t be the same.)

    By having something pre-generated by the server (with renderToString), I understand that I cannot inject any variables to this. The only solution I see is to make my url “verbose”, and having content-ready (pre-generated) for every possibility.

    (I would be assuming that a better url can resolve my problem, example: /ca/fr/buyer)

    What are your though regarding this, is there any way to preserve a simple url (/buyer) and just handle the context in React by any ways?

    Thanks a lot!

  5. Hey Jeff,

    As you rightly point out, the simple solution to this is to make it so that you have verbose URLs, with country and language preferences set. This has advantages for search as much as anything else, as it means you have multiple copies which can be indexed for the right versions of a search engine for different languages. So for example, Google FR can have the French language, French country version, whilst Google CA can have the French Canadian version.

    Beyond that, if you want to go down the single URL version, there’s an option – you can monitor traffic volume to each page, and default new visitors to the most often seen country/language combination. That will get the most common version to be the one indexed. However, you can also look for a logged in Facebook cookie, and if the user is logged in, pull their information and from that get their location. Then you can load up the most likely combination based on that knowledge.

    Whilst that’s the most “clever” solution, having the country and language in the URL is the best, and simplest.

Comments are closed.