What is the expected network performance between Amazon EC2 instances? What is the available bandwidth between Amazon EC2 and Amazon S3? How about in and out of EC2? These are common questions that we get regularly. While we more or less know the answers to them out of our own experience in the past 15 months, we haven't really conducted a clean experiment to put some more precise numbers behind them. Since it's about time, I've decided to do some informal experiments to measure some of the available network performance around EC2.
Before we start, though, a disclaimer: As in the drug commercials, results may vary! :) The results presented here use a couple of EC2 instances and therefore should only be interpreted as typical or possible results. The only claim that we make here is that these are the results we got, and therefore we expect that perhaps this is an indication of available performance at this point. Amazon can make significant hardware and architectural changes that could greatly alter the results.
Let's start with some experiments to measure the performance from EC2 instance to instance.
Performance Between Amazon EC2 Instances
In this first experiment I boot a couple of EC2 large instances. I make one of them a "server" by setting up Apache and copying some large files into it, and use the other instance as a client by issuing HTTP requests with curl. All file transfers are made out of memory-cached blocks, so there's virtually no disk I/O involved in them.
This experimental setup consists of two Large EC2 instances:
- Server: Apache (non-SSL) serving 1-2GB files (cached in memory)
- Client: curl retrieving the large files from the server
These two instances seem to be separated by an intermediate router, so they don't seem to be on the same host. This is the traceroute across them:
traceroute to 10.254.39.48 (10.254.39.48), 30 hops max, 40 byte packets 1 dom0-10-254-40-170.compute-1.internal (10.254.40.170) 0.123 ms 0.102 ms 0.075 ms 2 10.254.40.2 (10.254.40.2) 0.380 ms 0.255 ms 0.246 ms 3 dom0-10-254-36-166.compute-1.internal (10.254.36.166) 0.278 ms 0.257 ms 0.231 ms 4 domU-12-31-39-00-20-C2.compute-1.internal (10.254.39.48) 0.356 ms 0.331 ms 0.319 ms
Using a single curl file retrieval, we were able to get around 75MB/s consistently. Adding additional curls uncovered even more network bandwidth, reaching close to 100MB/s. Here are the results:
- 1 curl -> 75MB/s (cached, i.e., no I/O on the apache server)
- 2 curls -> 88MB/s (2x44MB/s) (cached)
- 3 curls -> 96MB/s (33+35+28 MB/s) (cached)
I did not repeat the experiments using SSL. However, I did some additional tests transferring files using scp across the same instances. Those tests seem to max out at around 30-40MB/s regardless of the amount of parallelism, as the CPU becomes the bottleneck.
This is really nice. Basically, we're getting a full gigabit between the instances. Now, let's take a look at what we get when EC2 instances talk to S3.
Performance Between Amazon EC2 and Amazon S3
This experiment is similar to the previous one in the sense that I use curl to download or upload files from the server. The server, however, is s3.amazonaws.com, still using HTTP and HTTPS since S3 is a REST service.
This experimental setup consists of one Large EC2 instance:
- curl to retrieve or upload S3 objects to or from S3
- Amazon S3: i.e., s3.amazonaws.com
- serving (or storing) 1GB files
The trace to the selected s3 server looks like:
traceroute to s3.amazonaws.com (22.214.171.124), 30 hops max, 40 byte packets 1 dom0-10-252-24-163.compute-1.internal (10.252.24.163) 0.122 ms 0.150 ms 0.209 ms 2 10.252.24.2 (10.252.24.2) 0.458 ms 0.348 ms 0.409 ms 3 othr-216-182-224-9.usma1.compute.amazonaws.com (126.96.36.199) 0.384 ms 0.400 ms 0.440 ms 4 othr-216-182-224-15.usma1.compute.amazonaws.com (188.8.131.52) 0.990 ms 1.115 ms 1.070 ms 5 othr-216-182-224-90.usma1.compute.amazonaws.com (184.108.40.206) 0.807 ms 0.928 ms 0.902 ms 6 othr-216-182-224-94.usma1.compute.amazonaws.com (220.127.116.11) 151.979 ms 152.001 ms 152.021 ms 7 18.104.22.168 (22.214.171.124) 2.050 ms 2.029 ms 2.087 ms 8 126.96.36.199 (188.8.131.52) 2.654 ms 2.629 ms 2.597 ms 9 * * *
So, although the server itself doesn't respond to ICMPs, the trace tells that there's a significant path to be traversed.
Let's start with downloads, more specifically with HTTPS downloads. The first thing that I noticed is that the performance of a single download stream is quite good, at around 12.6MB/s. While download performance doesn't scale linearly with the number of concurrent curls, it is possible for a large instance to reach higher download speeds when downloading several objects in parallel. The maximum performance seems to flatten out around 50MB/s. At that point the large instance is operating at a CPU usage of around 22% user plus 20% system, which given the SSL encryption going on is nice.
Here are the raw HTTPS numbers:
- 1 curl -> 12.6MB/s
- 2 curls -> 21.0MB/s (10.5+10.5 MB/s)
- 3 curls -> 31.3MB/s (10.2+10.0+11.1 MB/s)
- 4 curls -> 37.5MB/s (9.0+9.1+9.8+9.6 MB/s)
- 6 curls -> 46.6MB/s (8.0+7.8+7.6+7.9+7.8+7.5 MB/s)
- 8 curls -> 49.8MB/s (6.0+6.3+7.0+6.1+6.0+5.9+6.2+6.3 MB/s)
The SSL encryption uses RC4-MD4, so there is a fair amount of work for both S3 and the instance to do. The next natural question is to find out if there's more to gain when talking to S3 without SSL. Unfortunately, the answer is no. While the load in the client reduces significantly (from 22% to 5% user and from 20-14% system when using eight curls), the available bandwidth using non-SSL is basically the same - the differences fall within the margin of error. This leads me to believe that in either case the instance is not the bottleneck. Here are the same data points for non-SSL (HTTP) downloads:
- 1 curl -> 10.2 MB/s
- 2 curls -> 20.0 MB/s (10.1+9.9 MB/s)
- 3 curls -> 29.6 MB/s (10.0+9.7+9.9 MB/s)
- 4 curls -> 37.6MB/s (9.1+9.4+9.4+9.7MB/s)
- 6 curls -> 46.5 MB/s (7.8+7.8+7.6+7.9+7.8+7.6 MB/s)
- 8 curls -> 51.5 MB/s (6.6+6.4+6.6+6.3+6.2+6.2+6.7+6.3 MB/s)
Interestingly enough, a single non-SSL stream seems to get less performance than an SSL one (10.2MB/s vs 12.6MB/s). I didn't check whether the SSL stream uses compression; that may be one reason this is occurring.
So how about uploads? I'll use the same setup but using curl to upload a 1GB file using a signed S3 URL.
The first interesting thing to notice from the results is that one single upload stream gets half the bandwidth that the downloads get (6.9MB/s vs. 12.6MB/s). However, the good news is that the upload bandwidth still scales when using multiple streams.
Here are the raw numbers for SSL uploads:
- 1 curl -> 6.9MB/s
- 2 curls -> 14.2MB/s
- 4 curls -> 23.6MB/s
- 6 curls -> 37.6MB/s
- 8 curls -> 48.0MB/s
- 12 curls -> 53.8MB/s
In other words: give me some data and I'll fill up S3 in a hurry :-). So what about using non-SSL uploads? Well, that turned out to be an interesting one. I've seen a single curl upload achieve the same performance as download - one curl upload with no SSL can achieve 12.6MB/s. But over quite a number of experiments I've seen non-SSL uploads exhibit a weird behavior where some of them mysteriously slow down and linger for a while almost idle (at a very low bandwidth). The end result is that the average bandwidth at the end of the run varies by a factor of almost 2x. I'm still investigating to see what happens.
The bottom line from these experiments is that Amazon is providing very high throughput around EC2 and S3. Results were readily reproducible (except for the problem described with the non-SSL uploads) and support high-bandwidth, high-volume operation. Clearly, if you put together a custom cluster in your own data center you can wire things up with more bandwidth, but for a general-purpose system, this is a ton of bandwidth all around.
myke Do you have any numbers of traffic outside the amazon network?
blanquer Do you mean from/to EC2? As far as we know, there’s no explicit throttling going in and out of EC2 from/to the Internet. So I would assume that most likely the bottleneck is your Internet connection (i.e., there are not many people that have GigaBit ethernet pipes to the Internet…:-) ). But as far as numbers to back that up…no. I don’t have them. Josep M. Roberto Great analysis ! I always to know these numbers. Thanks for sharing.
Thorsten Myke, it’s pretty clear that Amazon’s “internet connection” is not a bottleneck. Your connection is the most likely bottleneck, second would come TCP window size bottlenecks, and third probably contention inside Amazon’s fabric. I don’t have any knowledge of Amazon’s fabric, but if you stick 40 servers into a rack it’s kind’a tough to have 40GB+ of bandwidth into that rack. Actually, not tough but quickly bloody expensive… So if the other 39 guys are not pumping a GB at the same time as you are, everything is fine, but if they are then, well, everyone suffers… Al This is great info, thanks for doing the tests gives us a real feel for the internal Amazon network performance, which appears excellent overall given its scale! On the differences EC2 – EC2 vs EC2 – S3, the hops between EC2 and S3 traverse more than one network, they could even be in different data centers. Also your caching between EC2 and EC2 instances is likely to be different to S3 caching so S3 disk/chunk storage may start playing a role. But this illustrates one thing, don’t bother using the EC2 disks architect using RAM, sockets and S3.. PS have you done any disk performance tests on EC2 instances? regards Al abhik Am I correct in assuming that Amazon is not throttling/QoSing the connections between EC2 instances or between EC2 and S3? If so, then the throughput could be much less if there are more instances running on the physical boxes once EC2 becomes more popular?
Greg How about a bisection bandwidth measurement with many instances? A 2 instance cluster isn’t what most people use EC2 for.
Thorsten Al: you are correct that S3 has “a bit” more work to do with the data than Apache in our tests. We didn’t intend these as apples-to-apples comparison, although the way we wrote this up in one blog entry may have make it look like that. Disk measurements sound like a good idea!
Abhik: Amazon is pretty clearly limiting the throughput you get on a small instance, and they officially state 256kbps. They don’t say much on a large instance and evidently the Gbps is not throttled. We don’t know how many large fit onto a physical box, but it wouldn’t be entirely unreasonable to assume that each instance could get its own physical network interface.
Greg: a 2 instance cluster is very interesting, for example for MySQL master/slave replication, so a lot of people actually do use EC2 for that. But I also understand what you’re talking about. We haven’t experimented in that space, maybe some of the hadoop users on EC2 can provide you with this type of info. I suspect that larger cluster tests become a lot more dependent on the actual instance to host allocation.