If you're delivering a high traffic site, or want to introduce resilience into your set-up, you need to deploy a load balancing configuration.
There are a lot of easy mistakes to make. In this article we’ll cover those and make sure you are on the best possible path.
Choosing a load balancer
The first step is to choose a suitable load balancer. There are many options out there and any load balancer with the right features should work. However, there are many inferior systems that will give you nothing but problems.
Here at Zengenti, we are firm believers in HAProxy. Using HAProxy we can quickly deploy on a 900MB footprint Ubuntu server. We are currently able to get a pair of HAProxy servers up and running in around an hour. We aim to get this down to minutes over the next 12 months.
While I can recommend HAProxy, it has, like any open-source software, a steep learning curve.
Despite HAProxy providing every feature you can imagine for free, I wouldn't recommend using it unless you're familiar with it. This is especially true if you only need a couple of load balancers. For us, running lots of load balancers, the learning curve was worth the effort. I would say it has taken us 2-3 years to be where we are now!
Features required from your load balancer
There are lots of features available for load balancers. The ones I've picked out here are those we've found to be essential for a sensible and reliable deployment.
It is important that your load balancer can monitor the web service on each deployed front-end. It also needs to be able to remove the front-end from the pool if a problem arises. While it seems an essential requirement, not all load balancers provide this feature. Some of our largest customers with the most expensive load balancing solutions are unable to do this.
This is one of many names for the capability to keep a user hooked on a single back-end server (assuming it hasn't failed) for the duration of their visit to the site.
The only Contensis feature that requires this is the Folder Password Security. If you don't use this feature then session affinity is not essential.
Max connection settings per backend host
We will come to this a little later, but in essence the idea here is to only use secondary or tertiary servers if the load on the first server is at capacity.
Load balancer clustering
Clustering provides the load balancer with a second partner. This means that if the first load balancer fails we can divert traffic to the backup load balancer. In HAProxy we use UCARP to achieve this. If a head fails when using UCARP the shared IP address will move to the backup host in an instant.
While clustering is useful from a DR perspective, it's absolutely vital for maintenance work. We can move the IP address between a pair of running load balancers during the day with no loss of service to incoming traffic. In a DR situation we are looking at a second or so for the move, when it is not controlled (a few dropped TCP connections).
Deciding on the number of web servers
This is a really tricky thing to get right. From years of experience I can often make an educated guess based on Google Analytics account information and other data. It is though just that – an educated guess.
The number of servers required will depend on many factors. These include:
Some of the above constraints will change over time. They are often based on current configuration and current load. These fluctuate, making it impossible to be precise. Our higher education clients are a good example. We provision more servers for universities during the two week clearing period than at any other time of the year.
One approach is to load test your server. This does offer some value, but remember that you will be load testing against a known script. Internet or site visitors are not as predictable. Once again, while load testing helps, it is not precise.
What do you do then? The simple answer is over-provision, monitor and then remove servers if possible. However, you must continue to check your load so that you can add extra servers when needed.
What we can tell you is that even in one of the largest systems we host, with the right caching, no evil – badly written – bespoke code, and decent servers, we only need two web servers to service demand. We also keep a couple of extras available in a second datacenter to deal with a peak situation and for DR.
In most situations we can use a pair of servers: one active and one passive. This is by far the best configuration from a performance perspective.
Load balancer configuration
There are some easy mistakes to make when load balancing Contensis. What people often do is say 'Oh, we have 4 servers. Let's just spread the load between them.' Although there are some cases where this may be a valid approach, in our experience even sites with 8M+ pageviews a month only need two active servers.
Running extra servers can play havoc with your caching and server warm up times. Say, for example, that four users visit a site. If we have four load balanced servers in a round robin configuration, each of the visitors will visit a separate server. This means that each of the four servers needs to put the request URL into cache and potentially compile it. As a result the first request could take several seconds rather than 100-200ms.
We suggest that you put your webservers on servers with the best processors you have, and the most cores you can sensibly provision in your VM environment. In most environments, a single server running as a primary is a perfect starting point. Only introduce a second if you see the CPU constantly running over say 65%.
If you have four servers in the pool, all you are doing is slowing down requests. You're effectively multiplying the CPU requirements of your environment fourfold. I'm pretty sure your infrastructure guys wouldn't be too happy about that!
Keeping files up-to-date on all your webservers
Keeping files up to date across your webservers is one of the simplest problems to solve. But it's often where people make the biggest mistakes.
As Contensis has the ability to create multiple publishing servers, the obvious solution is to simply add new ones in Contensis for each load balanced web server. This will work for a tiny site, but in most scenarios it's a really bad idea.
You see, Contensis publishes to each publishing server separately. This means that as Contensis updates one server with fresh content, one will be out of date. It also means the number of site publishes doubles in time with every publishing server you add. There are some cases where this is useful, such as DR config, but we advise never adding multiple publishing servers in Contensis for a set of load balanced web servers.
You may now be thinking, 'Well, if Contensis won’t do it for me, how do I do it?' The answer is simple. First publish your files to a known folder. You can then replicate from that folder to your front-end servers using a technology of your choice.
You can publish to a clustered file server if you have concerns about publishing resilience. But most of the time the best approach is to publish to one of the front-ends and replicate from there. The best solution really depends on how you are going to move your files about.
We will look at two approaches we use.
DFS or Distributed File System is a standard feature that you install on any modern Windows server. In essence it allows you to create a copy of files across multiple servers with near instant update times. The technology is easy to install with lots of great documentation available online. We have used it successfully for 3-4 years now.
We would suggest using DFS in most scenarios. Our configuration would be something like this:
Create a directory called 'Websites' on the Data drive of all your webservers.
Now create a DFS replica across all servers for the entire directory.
Test the replica by creating a file inside d:\websites and check that it is replicated instantly across your other web servers.
Choose a web server to push content through to. This is normally a backup server rather than the primary one.
Create a new directory www.mysite.com inside the Websites directory.
Configure FTP services on this server pointing to the new directory, and another for resilience and point Contensis at the new FTP details.
Now simply publish files, and see them replicate across all your servers instantly. If they fail to replicate then check all the previous steps.
DFS has a small learning curve thanks to the availability of documentation. One thing to be careful of is to make sure you exclude “.bak” files from replicating. This is set in the advanced DFS config. If you forget to do this, you will get issues with the .bak files created by the Contensis framework.
Once you have DFS up and running, you can pull servers out of the DFS temporarily. This is how we often do upgrades. Do watch out though, because when you disable replication to a particular host it doesn't stop straight away. We always check the setting has taken hold by creating or changing a file. This is a little frustrating, but there's no workaround at this time. DFS works by deploying GP, so it can take 2-15 minutes to see the config change work.
If you did decide to deploy to a backup server, or just a standalone file server, when you deploy a new version of Contensis, you can disable replication. You can then enable it again, server by server, once you are sure the server is working and healthy. If you bring them all back on line simultaneously you will get problems. Sites take anywhere from 10 seconds to 5 minutes to warm up and start working properly again.
We will often draw servers out of the LB config, deploy using DFS and then once working push them back into the LB config. There is some work involved, but it will keep your site running beautifully during an upgrade with no loss of service to your end users.
We use a clever trick that means we can pull a server out of the POOl at IIS level.
One alternative to DFS is Robocopy. Robocopy is another Microsoft product that ships with the OS these days. It's a great tool for identifying changes in thousands of files. Do make sure you are on the latest version – there were some issues with earlier versions.
The process is identical to DFS in principle. The main difference is that you simply schedule a Robocopy of the data to the front-ends, rather than let DFS handle it.
The advantage of Robocopy is the control it gives you over scheduling. You can schedule it to run when you want and stop it when required.
The disadvantage is that sometimes when you get to hundreds of thousands of files, it can slow down a little and use some CPU. This isn't an issue with DFS.
Robocopy was our tool of choice, but we are gradually moving across to DFS for everything we do. That said, it's certainly worth a look.