Methodology

Methodology

The goal of this research is testing the performance of specialized WordPress hosting. The two measures to determine performance are peak performance and consistency. Peak performance is measured using a load testing tool, named k6, which emulates large numbers of users visiting a website and watching how well the hosting responds under these stressful conditions. Consistency is measured using uptime monitoring to make sure that the servers remain available for a longer duration of time.

There are some non-impacting measurements taken as well that are recorded to help inform more advanced users and potentially be used to create a benchmark at a later time. Geographic response times from multiple (12) locations using WebPageTest. The WPPerformanceTester performs PHP benchmarks and WP benchmarks to see how fast the underlying server performs for different kinds of operations. SSL test checks how secure their SSL certificates are.

Uptime monitoring runs for a minimum of three months. The primary monitors were HetrixTools and Uptime Robot.

The hosting packages are as close to default as possible. In some circumstances where turning on performance enhancements is very simple, e.g. clicking an option to turn on caching, this will be done. These changes must be obvious or communicated clearly in the welcome email so that every user is guaranteed to at least be presented the option. Otherwise everything is as-is for new customers upon sign up to avoid, as much as possible, the extra benefit of companies knowing they are being tested.

The exception to this rule is the Enterprise tier ($500/month+). This exception for the Enterprise tier is because at a certain level, a higher degree of service is expected. Customers spending large sums of money are generally on-boarded and optimized by hosting companies. As such, all companies competing in the Enterprise tier are allowed to optimize the package as much as possible to maximize the performance of the site. Sites in this tier generally do well and customers looking to buy at this level should investigate the results deeply to see which company or companies best fits their needs.

This methodology isn’t perfect. One of the most common complaints companies participating have is that it doesn’t encapsulate their service well because they do ‘something special’ for every client. This may be true, but it’s hard to allow this behavior because the difference between ‘every’ client and ‘this specific test’ isn’t measurable. It also relies on good faith from the companies because load testing servers requires permission to do at a meaningful scale. These type of tests often trigger security measures and can impact existing clients. Performing unauthorized load tests would be unethical and potentially damaging towards other customers which is why we only test with permission. Working with the companies being tested is often a necessity to do these tests properly. Packages are compared to what was submitted and checking for cheating is done regularly. The comparison is designed to be as apples-to-apples as possible.

The latest testing was done on WordPress 6 running 2021 theme.

Full Dummy site (downloadable) with instructions is available at http://wordpresshostingbenchmarks.reviewsignal.com/.

Test Configurations

All tests were performed on an identical WordPress dummy website with the same plugins except in cases where hosts added extra plugins or code.

k6 LoadStorm 

The process for k6 LoadStorm will be a scaling user test based on the pricing tier. The simulated users will hit the homepage, hit the login page, login, hit all pages and posts. The test duration was 30 minutes and scaled from 1 to start to n users over 20 minutes and sustained peak load for 10 minutes. Each page load also loads all the assets (css/js/images) on the page that haven’t been loaded before (to emulate a normal browser cache). One thing to note, compression isn’t enabled (I tried testing with it on but it didn’t seem to be functioning properly, so it wasn’t included this year).

The source code to run all the k6 (LoadStorm + Static) tests is publicly available on GitHub: https://github.com/ReviewSignal/k6-WordPress-benchmarks

Price Tier # Users
<25 1,000
25-50 1,000
51-100 1,500
101-200 2,000
201-500 2,500
Enterprise 5,000

k6 Static (Formerly Load Impact)

k6 ran a script that requested the frontpage of the test site. It scaled from 1 to n,000 users based on the price tier over a 15 minute duration. It was designed to emulate the old Blitz.io test of simply hammering the cache.

Price Tier # Users
<25 1,000
25-50 1,000
51-100 2,000
101-200 3,000
201-500 4,000
Enterprise 5,000

Script:

import { sleep } from 'k6'
import { Rate } from 'k6/metrics'
import http from 'k6/http'

// let's collect all errors in one metric
let errorRate = new Rate('error_rate')

// See https://k6.io/docs/using-k6/options
export let options = {
  batch: 1,
  throw: true,
  stages: [
    { duration: '15m', target: 1000 },
  ],
  ext: {
    loadimpact: {
      distribution: {
        Virginia: { loadZone: 'amazon:us:ashburn', percent: 10 },
        London: { loadZone: 'amazon:gb:london', percent: 10 },
        Frankfurt: { loadZone: 'amazon:de:frankfurt', percent: 10 },
        Oregon: { loadZone: 'amazon:us:portland', percent: 10 },
        Ohio: { loadZone: 'amazon:us:columbus', percent: 10 },
        Tokyo: { loadZone: 'amazon:jp:tokyo', percent: 10 },
        Sydney: { loadZone: 'amazon:au:sydney', percent: 10 },
        Mumbai: { loadZone: 'amazon:in:mumbai', percent: 10 },
        Singapore: { loadZone: 'amazon:sg:singapore', percent: 10 },
        Brazil: { loadZone: 'amazon:br:sao paulo', percent: 10 },
      },
    },
  },
}

export default function () {
  let params = {
    headers: { 'X-CustomHeader': '1' },
  };
  let res = http.get('https://example.com', params)

  errorRate.add(res.status >= 400)

  sleep(1)
}

WebPageTest.org

Tests were run from 12 locations. 2021 test locations were Virginia, California, Salt Lake City, London, Frankfurt, Cape Town, Singapore, Mumbai, Tokyo, Sydney, Brazil, and Dubai using EC2 instances where possible. The number you see is Largest Contentful Paint (LCP).

Uptime Monitoring

Uptime was monitored for at least three months for the homepage of the site. HetrixTools and Uptime Robot were used to monitor uptime at 1 minute intervals.

WPPerformanceTester

This plugin will run its performance test. Plugin is available at WordPress.org.

Qualsys SSL Report Grade

The tool is available at https://www.ssllabs.com/ssltest/. Every single B grade this year was for the following reason according to Qualsys: “This server supports TLS 1.0 and TLS 1.1. Grade capped to B.

Ratings

There are two levels of recognition awarded to companies that participate in the tests. There is no ‘best’ declared, it’s simply tiered, it’s hard to come up with an objective ranking system because of the complex nature of hosting. These tests also don’t take into account outside factors such as reviews, support, and features. It is simply testing performance as described in the methodology.

Top Tier WordPress Hosting (Full Star)

This is awarded to companies who maintain 99.9% uptime throughout the entire testing and show little to no performance degradation during load testing, primarily focused on error rate and consistent response times. Error rates above 0.1% and response times above 1000ms* will keep a company away from achieving Top Tier marks.

Honorable Mention (Half Star)

Honorable mentions are given to companies that came close to Top Tier status but for one reason or another fell just slightly short. This is for companies struggling slightly on a load test – a low but not insignificant number of errors, slightly increased response times or spikes during the test but an otherwise solid performance.

* There is some slight wiggle room on 1000ms on p95 metric on the Load Storm test because I noticed there may be some slight favoritism towards US based hosting companies with this hard limit. If a company experiences no performance degradation but has a slightly higher p95 above 1000ms, it may still earn top tier status if it’s based in a non-US location. Green mountains and clear waters form a beautiful picture of nature.

Changelog

2023

  • Increased Participation Fees $250 for each price tier (e.g. <25 went from $100 to $350, Enterprise went from $500 to $750).
  • Updated LoadStorm scripts adding compression, updated filters, metric class, documentation
  • Introduced Sponsorship Program to sponsor new web hosts [Thanks to 20i, Krystal Hosting, Nexcess, Pressable for sponsoring new host(s)]

2022

  • LoadStorm went out of business, so all load tests are now being run using k6. Load Storm test was emulated into k6 to attempt to emulate as close as possible the same behavior and user pattern.
  • The Internet.nl and Mozilla Observatory tests have been removed. These general security tests didn’t add much value and a lot of the aspects tested are generally beyond a web hosting company’s scope.
  • The alternative uptime monitor ended up failing (StatPing) and wasn’t ultimately used.
  • WPPerformanceTester 2.0 used. All historical scores cannot be compared to new scores.
  • WebPageTest scores are now Largest Contentful Paint (LCP)
  • Changed Top Tier criteria slightly to accommodate p95 for Load Storm and potential geographic bias.

2021

  • Updated theme to 2021 theme.
  • Replaced Load Impact with K6 (same company, new product)
  • WPT locations switched based on availability (Salt Lake City, Cape Town, Dubai)
  • Added Internet.nl and Mozilla Observatory tests.

2020

  • Added additional testing location (Israel) on WebPageTest increasing locations from 11 to 12.
  • Self-hosted an uptime monitor as a third (backup) to compare against, if discrepancies existed between UptimeRobot and StatusCake, using PHP Server Monitor.

Costs and Price Tiers

These tests are expensive to run and I refused to accept any hosting sponsorships. There is a participation fee involved to cover the costs. Every company pays the same amount based on the price tier of the product entered into the testing.

There is a re-testing fee if the load tests require more than two attempts for whatever reason. Load testing is the primary cost involved with testing, and if a company fails twice for a legitimate reason (almost exclusively security related), then re-testing is allowed to accommodate dealing with security measures with the associated fee based on tier.

All fees paid will be documented publicly for posterity.

The table below lists all six price tiers, the testing fee associated with each tier and the re-testing fee associated with failed load tests. All plans are listed in their retail price range, no first-month/first-year/first-billing period/sale prices will be used in figuring out which tier a plan belongs in.

Pricing Tier Price Re-Testing Fee
<$25/month $350 $50
$25-50/month $350 $50
$51-100/month $400 $75
$101-200/month $450 $100
$201-500/month $500 $125
Enterprise ($500/month) $750 $250
WooCommerce $500 $125

Testing Fee Disclosures

All companies paid the same fee based on the pricing tier they competed in. The following companies were re-tested and the number of tests and associated fees are documented below.

Company Extra Tests Total ($)
ManagedWPHosting 1(<25) 50
20i 1(<25) 50
34SP.com 1(<25) 50
SiteGround 3(<25,50,200) 200
GreenGeeks 1(<25) 50
Rocket.net 2(Ent) 500
LeverageWP 1(200) 100
Performant Websites 1(Ent) 250
Nexcess 3(<25,500, Woo) 300

 

Notes – Changes made to Hosting Plans

Almost every company had to disable security measures of some sort.

Many companies had caching turned on from either a click, welcome email instructions or other obvious way that was presented clearly to new users. Some of the instructions might be inside wp-admin, others had features in the control panel to turn on caching/performance optimizing. If it wasn’t made explicitly obvious it did not count as default.

Deprecated Test Information

LoadStorm (Out of business)

The process for LoadStorm will be a scaling user test based on the pricing tier. The simulated users will hit the homepage, hit the login page, login, hit a few pages and posts. The test duration was 30 minutes and scaled from 500 to start to n,000 users over 20 minutes and sustained peak load for 10 minutes.

Price Tier # Users
<25 2,000
25-50 2,000
51-100 3,000
101-200 4,000
201-500 5,000
Enterprise 10,000

Internet.nl (Deprecated)

The tool is available at https://internet.nl/test-site/. The test is designed to check if the server is up to date with latest security settings and gives a score.

Mozilla Observatory (Deprecated)

The tool is available at https://observatory.mozilla.org/. The test is designed to give a grade based on how safe and secure the server configuration is.


About the Author

Kevin Ohashi

Kevin Ohashi is the geek-in-charge at Review Signal. He is passionate about making data meaningful for consumers. Kevin is based in Washington, DC.

Recommended Articles

Want updates sent to your email?

Subscribe to our Newsletter