Time to first byte has become a popular enough issue for web site operators that it now has its own acronym. The true problem with slow responses from a web server can be difficult to understand, because it has a lot more to do with human nature than it does with technical issues. Getting a faster response is the easy part. Getting and keeping a web audience is much more difficult.
The truth is slow web server response triggers a couple of beliefs in your readers and viewers. The first is that your site is down, which is an almost instantaneous hard bounce for almost 100% of your audience. The second is attention span. When things are slow, audiences go. The web and especially the mobile web has trained audiences to believe everything should be both free and instantaneous. If you aren’t delivering, some other site is. With those things in mind, here are some things to watch for if you want your site to be as quick and effective as possible.
RAM
One of the key facts in software development sounds counter-intuitive, but the more experience you have, the more obvious it becomes. All software problems are inherently hardware problems. Web servers run on software, but more fundamentally, they also run on available memory. If your web server has traffic that exceeds its capacities, the first barrier will be when you start running out of RAM.
The technical reasons for this are fairly simple. Every web page you serve occupies RAM, and the instance of your web server software that sends the page to the client also takes up memory. When there are too many of these pages and instances in RAM, a bottleneck develops as new pages and web server instances must wait until space is freed up by closed connections. The slow first byte time of your page will not be sent until there is sufficient memory.
CPU
Static web pages almost never trigger an overworked CPU. The reason is that web servers do not require heavy calculation. The CPU is almost always in an idle state waiting for the web server and the network driver to complete their respective tasks.
However, if your page depends on data, middleware, access to separate servers or anything else that might require additional calculations on the server side, a CPU bottleneck can develop in much the same way the RAM issue develops. This particular problem is a little easier to diagnose because CPUs are much easier to precisely monitor than RAM usage. It should also be noted that database and middleware-driven sites will exacerbate RAM problems and slow first byte time.
Services
While it goes without saying network capacity can slow down a web server, a more obscure problem can develop surrounding microservices and assets or facilities drawn from other domains. Fonts, for example, when served from a central repository, can bring a web site down if the font server is overloaded or unavailable. The same goes for shared Javascript libraries, database capacity, cloud servers, calculation engines or anything else that is “outsourced” from your server to some other domain’s machine.
The trend towards shared services isn’t likely to slow down any time soon. That said, it is vitally important that your data and web assets be as close to the client as possible before every page load. Many popular streaming services park their most popular films and television shows in hundreds of local network operations centers for this very reason. Taxing a national trunk is far more likely to generate delays and slowdowns than if your favorite episode is only a hop away in a NOC down the street.
The key to top performance is understanding how web servers work and how their software integrates with available hardware. When the two elements work well together, time to first byte can be minimized.