AWS T2 CPU Credit Balance and Real CPU Workload Explained

AWS releases new products at an astounding rate, making it hard for users to keep up with best practices and use cases for those services. The risk for IT teams is that they will miss out on the release of AWS services that can improve business operations, save them money, and optimize IT performance.

Let’s revisit a service that I think is particularly underutilized. Amazon’s T2 instance types are not new, but they can seem complicated to someone who is not intimately familiar. In the words of Amazon, “T2 instances are for workloads that don’t use the full CPU often or consistently, but occasionally need to burst to higher CPU performance.” This definition seems vague to me. What happens when the instance uses the CPU more than “often”? How is that manifested in actual performance? How to reconcile wildly varying CloudWatch and OS statistics, like the below?

screen-shot-2016-12-16-at-9-33-56-am screen-shot-2016-12-16-at-9-34-04-am
Read on!

Amazon explains that “T2 instances’ baseline performance and ability to burst are governed by CPU Credits. Each T2 instance receives CPU Credits continuously, the rate of which depends on the instance size. T2 instances accrue CPU Credits when they are idle, and use CPU Credits when they are active. A CPU Credit provides the performance of a full CPU core for one minute.” So the instance is constantly “fed” CPU Credits, and consumes them when the CPU is active. If the rate of consumption is less than the rate of feeding, the CPUCreditBalance (a metric visible in CloudWatch) will increase. Otherwise it will decrease (or stay the same).

Let’s make this less abstract: looking at a T2.medium, Amazon says it has a baseline allocation of 40% of one vCPU, and earns Credits at the rate of 24 per hour (each credit representing one vCPU running at 100% for one minute; so earning 24 Credits per hour allows you to run the instance at the baseline of 40% of one vCPU). This allocation is spread across the two cores of the T2.medium instance. An important thing to note is that the CPU Credits are used to maintain your base performance level – the base performance level is not given in addition to the Credits you earn. So effectively this means that you can maintain a CPU load of 20% on a dual core T2.medium (as the two cores at 20% combine to the 40% baseline allocation). In real life, you’ll get slightly more than 20%, as sometimes you will be completely out of Credits – but Amazon will still allow you to do the 40% baseline work. Other times you will briefly have a Credit balance, and you’ll be able to get more than the baseline for a short period of time.

For example, looking at a T2.medium instance running a high workload, so it has used all its Credits, you can see from the LogicMonitor CloudWatch monitoring graphs that Amazon thinks this instance is constantly running at 21.7%:

This instance is consuming 0.43 CPU Credits per minute (with a constant balance of zero – so it is consuming all the Credits as fast as it is allocated them). So, in fact, this instance is getting 25.8 usage Credits per hour (.43 * 60 minutes), not the theoretical 24.


But what does this mean to the instance’s performance? Amazon thinks the instance is running at 21% utilization (as reported by CloudWatch). What does the operating system think?

Looking at operating system performance statistics for the same instance, we see a very different picture:


Utilization is, despite what the CloudWatch utilization shows, not constant, but jumps around with peaks and sustained loads. How to reconcile the two? According to CloudWatch, the system is using 21% of the available node resources when it is running at 12%, per the operating system, and also using 21% when it is running at 80% per the operating system. Huh?

It helps to think of things a bit differently. Think of the 21% as “the total work that can be done within the current constraint imposed by the CPU Credits.” Let’s call this 21 work units per second. The operating system is unaware of this constraint, so asking the OS to do total work that can be done with 21 work units, it will get that done in a second, then be idle. It will think it could have done more work, if it had more work to do – so it will report it was busy 1 second, idle next 59 seconds – or 1.6% busy. Note, however, that doesn’t mean the computer could have done 98% more work in the first second. Ask the computer to do 42 work units and it will take 2 seconds to churn it out, so the latency to complete the task will double, even though it looks like the OS has lots of idle CPU power.

We can see this in simple benchmarks: on two identical T2.medium instances given the same workload, you can see very different time to complete the same work. One with plenty of CPU Credits will complete a sysbench test much quicker:

sysbench --test=cpu --cpu-max-prime=2000 run

sysbench 0.4.12:
  multi-threaded system evaluation benchmark

Number of threads: 1

Maximum prime number checked in CPU test: 2000

Test execution summary:

    total time:                          1.3148s

    total number of events:              10000

While an identical instance, but with zero CPU credits, will take much longer to do the same work:

Test execution summary:

    total time:                          9.5517s

    total number of events:              10000

Both systems reported, from the OS level, 50% CPU load (single core of dual core system running at 100%). But even though they are identical ‘hardware’, they took vastly different amounts of time to do the same work.

Effectively, this means that a CPU can be ‘busy’, but not doing any work (when it is out of Credits and has used its base allocation for that moment). It seems very analogous to the “CPU Ready” counter in VMware environments, which indicates that the guest OS had work to do, but could not schedule a CPU to do it on. This means that the “idle” and “busy” CPU performance metrics are, if you are out of CPU Credits, not indicators of the ability to do more work – instead they are indicators of the ability to put more work on the processor queue. And of course, when you have more things in the queue, you have more latency.

So, clearly you need to pay attention to the CPU Credits. Easy enough to do if you are using LogicMonitor – the T2 Instance Credits DataSource does this automatically for you. (This may already be in your account, or else can be imported from the core repository.) This DataSource plots the CPU Credit balance, and the rate at which they are being consumed, so you can easily see your credit behavior in the context of your OS and CloudWatch statistics:


This DataSource also alerts you when you run out of CPU Credits on your instance, so you’ll know if your sudden spike in apparent CPU usage is due to being throttled by Amazon, or by an actual increase in workload:

But what do you do if you get an alert that you’ve run out of CPU Credits? Does it matter? Well, like most things – it depends. If your instance is used for a latency sensitive application, then this absolutely matters, as it means your CPU capacity is reduced, tasks will be queued, and having idle CPU no longer means you have unused capacity. For some applications, this is OK. For some, it will ruin the end user experience. So having a monitoring system that can monitor all aspects of the system – the CloudWatch data, the OS-level data, and the application performance – is key.

One other note: T2 instances are the cheapest instance type per GB of memory. If you need memory, but can deal with the baseline CPU performance – then running a T2 instance, even though you consume all the CPU Credits all the time, may be a reasonable choice.

Hopefully that was a useful breakdown of the real world effect of exhausting your CPU Credits.

Want to see more? Follow us here:

On Facebook
On Twitter
On LinkedIn

Or, e-mail us @ [email protected]