Amazon launches new cloud in Europe

Amazon just unveiled an EC2 cloud in Europe, see the AWS feature guide on this topic. I assume blog posts by Jeff Barr and by Werner Vogels will be following shortly. This EU offering has been long rumored and has been requested many times over by customers in the EU as well as by US customers with sizable European user bases. Finally EC2 EU has arrived, it looks like it has landed in Ireland (at least a quick traceroute suggests that), I hope the Amazon team is ready for the customer onslaught!

The EC2 EU announcement is very significant in a way that may not be entirely obvious at first glance: it’s a separate EC2 deployment and tied as little as possible to the current US EC2 installation. The reason for this is simple: availability. The two installations basically share nothing other than the account credentials such that a massive failure in one is extremely unlikely to affect the other in any way. As a result EC2 users now have many levels of redundancy available to them: they can run services on multiple instances for local redundancy, they can split servers up into multiple availability zones within one region, and now a secondary or disaster recovery site can be set-up in another region. With RightScale we’re further making it possible to use multiple cloud providers to gain yet another level of redundancy. Sweet!

For users that want to operate in both the US and the EU regions at the same time the separation of the two regions introduces some friction. The resources are all bound to the region which means that while you use the same account credentials to access the US and EU services you can’t launch an AMI registered in the US on an EU instance, or you can’t use the same security group for servers in both regions. The EC2 team released an ec2-migrate-bundle tool to make copying of AMIs easy and we will also look at the use-cases ourselves and implement replication tools and automation in RightScale to make all this a bit simpler to manage. Overall the decision to keep the regions separate is definitely the right one: tools can help to make everything much more seamless than is really is. The converse would not be true, i.e creating high failure independence if the regions were tightly integrated would not be possible. Amazon made the right decision there.

The RightScale system will support the EU region in a few weeks. Unfortunately we have some work still ahead of us to handle the multi-region structure. Our vision is to deploy easily across clouds - Amazon’s US and EU regions as well as other clouds. In the meantime we’ve already copied our RightImages to the EU region and registered them there. We’d love to hear from you what tools and features you’d like to see to make operating across the regions as seamless as possible.

Comments (8)

Expanding RightScale with $13M new funding

I’m very happy to report that we just received $13M in funding from Index Ventures and Benchmark Capital! We will be using the funding to accelerate product and market development of our cloud computing management platform. In simple words: more money means more developers writing more features, supporting more clouds, enabling more automation, and making even more customers happy!

We’ve been supporting all cloud services from Amazon for a long time and we’ve recently added support for GoGrid and FlexiScale. We also interoperate with Eucalyptus and RackSpace’s CloudServers will be coming next. We continue to gear up to be able to support many more clouds as soon as they become publicly available and we’re building more and more infrastructure to make it easy to run deployments in multiple clouds at the same time.

At the same time we’re adding clouds we are also working with ISVs like MySQL, Bitrock, EnterpriseDB, and rPath to offer their software within the RightScale platform. In fact, we just completed the MySQL over EBS server templates and we’re getting close with support for Splunk. All this contributes to the ecosystem around RightScale which will offer more and more ready-to-go software to our users.

We’re also continuing to add more levels of automation to our service. The infrastructure clouds automate the provisioning of servers, but that only gets you to the login prompt. To realize the vision of cloud computing you really need servers that not only get provisioned automatically but that go into full production on autopilot. That’s what we’ve been specializing in and will continue to enhance.

Working with a European venture firm meshes well with the customer interest we’ve received from that part of the world. Hopefully the long-rumored Amazon EC2 offering in the EU will turn into reality soon; that would make cloud computing even more attractive for our European friends. But even with the current state of clouds, interest in RightScale has been global and we have customers in many countries. I’m sure that by the end of 2009 there will be cloud offerings on all continents (except Antarctica, I suppose :-).

Comments (9)

RightAWS 1.9.0 released

This is a quick note that we released a new version of our Ruby interface for all AWS cloud services. It now includes support for the EC2 windows features as well for CloudFront. Plus a number of bug fixes as well as enhancements. Download from http://rubyforge.org/projects/rightaws and enjoy!

Comments

Amazon releases CloudFront: a cloud content distribution network

cloudfront menuAmazon just made it’s CloudFront service public and in time-honored tradition RightScale offers full support for the new service in its dashboard. You can read Jeff Barr’s announcement and Werner’s blog post for the details. I must say that the folks at AWS have been very consistent in listening to users and reacting to it: many, many users of S3, the Simple Storage Service, have been serving up web assets directly from S3 as if it were a CDN. This works reasonably well in that S3 is very scalable and can sustain very high request rates. But it also has its limitations and lack of geographic replication is one of them in the context of CDN use. So offering a true solution to getting web content rapidly to browsers across the world is most welcome.

CloudFront is a content distribution service that caches S3 content at 14 locations on three continents based on the access patterns to the individual S3 objects. As far as I can tell, it’s a service that is quite distinct from S3, except that it currently uses S3 as the origin server (i.e. the original objects to be cached must reside on S3). To use CloudFront you create what is called a distribution for an S3 bucket and it returns you a DNS name you then use to refer to the cached version of that bucket. For example, let me create a distribution for a test bucket I have laying around:

cloudfront new

And now let’s look at the result:

cloudfront show

As you can see, CloudFront returned the domain dc5eg4un365fp.cloudfront.net to access the bucket content. To make URLs a bit prettier to look at, I specified a CNAME of blog-demo.rightscale.com and I could now go into my DNS service and create that CNAME so I could refer to the cached content using this nicer domain name. The info also shows that CloudFront is in the process of setting up the distribution, which in this case took a couple of minutes at most. The RightScale dashboard provides access to all the CloudFront functions and shows the status of all your distributions. Plus you can give distributions nicknames and write down some notes to jog your memory later on.

As a user of CloudFront you don’t really have control over the caching, you just put links to CloudFront in your pages and it does the rest. When a browser tries to access an object it gets directed to the “best” location through the DNS lookup process. The CloudFront server it hits either returns the content from cache, or if it’s not there, it requests it from S3 and adds it to its cache. Eventually infrequently accessed content is evicted from the cache.

There are number of restrictions on CloudFront. First of all, as mentioned above, the origin server has to be S3. Second, all the objects must be publicly accessible from S3, which makes sense although in some scenarios it would be convenient if this were not required. Third, CloudFront only supports HTTP, not HTTPS, which is an issue for SSL sites because including non-SSL images brings up annoying browser warning pop-ups. Lastly, CloudFront doesn’t provide the type of detailed usage reports that other CDNs offer.

Something that is confusing at first is the pricing. The key to understanding the pricing is to think of CloudFront as a service that is completely separate from S3. Imagine it were running on your own servers that you placed in 14 datacenters around the world and you were computing what it’d cost you. When a CloudFront server first gets a request for an object it has to retrieve it from S3, which incurs the normal S3 per-request and bandwidth charges. And then there are the per-request and bandwidth charges for the CloudFront server itself every time it serves an object up to a browser. I’ll leave the details to the Amazon docs, but if you keep in mind that they’re separate services it’ll make sense.

I can’t comment much on the performance of CloudFront as I’m lacking the infrastructure (and time) to test the service. Early reports indicate that download times with CloudFront are lower than directly from S3 and show less jitter. I’m sure there will be a hot debate about whether Amazon has enough sites around the world and whether the routing from users to CloudFront is as good as it needs to be.

All summed-up, this is a service a lot of S3 users have been asking for and Amazon now delivers. It’s also a service that is difficult for cloud users to implement on their own: we really need an organization like Amazon to solve the upteen logistical nightmares it takes to deploy infrastructure around the world. Of course every time Amazon brings out a new cool service there are always more features we’d love to have and I’m sure many of them will be forthcoming. In the meantime, I’m off figuring out where we’ll use CloudFront ourselves…

Comments (11)

Windows! SLA! Beta bye-bye! All on Amazon EC2 today!

The cloud is coming of age. Amazon has taken another huge leap forward today by announcing that EC2 is now out of beta, together with an SLA that is evidence of Amazon’s commitment to provide top-notch service. Their uptime has been stellar, and they are now standing behind their offering contractually in a much stronger way, and signaling how customers can set their expectations. Read about the announcement on the AWS blog and on their CTO’s blog.

The advent of Windows on EC2 is welcome news too.  Even though Windows is not typically the OS used for serving highly variable workloads, it is a sign of the cloud maturing that even the more static workloads typical of Windows deployments will be more easily allocated and managed using cloud resources. (Mhh, I wonder whether the Microsoft Professional Developers Conference happening next week has something to do with the timing of the announcement…)

RightScale is of course supporting the new features that enable launching and bundling Windows instances, and we’ll have everything on our production systems within a couple of days. (Update: our Windows support went live the same day as Amazon’s announcement and we’ll be adding some more functionality soon; feedback and suggestions are always appreciated!) In case you’re wondering, EC2 is supporting Windows Server 2003 R2 for the time being. Windows Server 2008 is apparently on the  roadmap but not available at present and it’s apparently against the T&C’s to upgrade on your own.

But let’s shift over to the differences between Windows and Linux instances (apart from the obvious).

Launching

Launching a Windows instance really is no different from launching a Linux instance: you just pick a different machine image (AMI). But once it’s running, the game changes: SSH is not exactly the most popular remote access tool for Windows, so instead you get to use RDP, Windows’ Remote Desktop Protocol. But there’s a catch: what’s the administrator password? Well, Amazon has concocted something I can’t really describe with any other word but a hack: at boot time, the ec2-configuration-service that Amazon added to the Windows AMIs generates an admin password randomly, encrypts it with your SSH private key, and writes it to the console output. You then use a command line tool (or ElasticFox) that reads the console output, locates the encrypted password, and uses your SSH public key to decrypt it. Then you get to type the password into the RDP client. [Expletive deleted...]

We’ll have an RDP button in the RightScale UI that will automate all this and get you into your server with far fewer hassles. Launching an RDP client from the web browser isn’t very smooth, unfortunately, specially as we want to support non-Windows users.

Bundling

Bundling is very different on Windows instances than Linux instances. The Linux approach of creating a loopback filesystem in a file, tar-ing and encrypting up the root disk onto that filesystem and then uploading that to S3 doesn’t quite cut it. Not that the process is all that great under Linux either: it’s one of the most fragile and frustrating aspects of EC2, and one we avoid using as much as possible with our server templates and RightScript mechanism.

For Windows there now is a “please bundle my instance, will you” API call to EC2. Nice! Except for the fact that it will shut the instance down in order to bundle it up! In Amazon’s words: “Internally, it queues the bundling task and shuts down the instance. It then takes a snapshot of the Windows volume bundles it, and uploads it to S3.” The API gets a couple of new calls to start the bundling and then to query on the progress of the bundling.

Of interest here is also the fact that Amazon recommends deleting all temp files using the Windows Disk Cleaner tool, then defragmenting, and finally zeroing the free space using “sdelete.” The last step is presumably because they’re bundling the raw disk partition and not the files in the filesystem and zeroing the unused space reduces the size of the compressed image.

For the RightScale UI we rolled all these API calls into a single bundling button: you press it, we make the calls, EC2 makes it happen, you watch the progress.

Mounting EBS volumes

Another slight difference is mounting Elastic Block Store (EBS) volumes on a Windows instance. You can theoretically attach up to 8 volumes to an instance, and they appear as drive letters ‘a’ through ‘h’. But the local disks also appear using these drive letters, so the low-down is that you can mount 5 EBS volumes on a small instance, 4 on a large, and 2 on an extra-large.

Pre-announced monitoring and auto-scaling services

We’re quite excited about Amazon’s pre-announcement of monitoring and auto-scaling services. The details are still quite sketchy through our sources, but all indications are that they’ll integrate very nicely into the RightScale system, giving our customers the choice of using our monitoring system or Amazon’s or both. We’ve been focusing on all the configuration management and dynamic configuration that needs to occur when doing autoscaling, which is much more than just launching instances when the monitoring system says it’s necessary. On top of that, the architecture of the multi-server deployment must be designed to actually support auto-scaling as well as failure tolerance. This is precisely why we offer our customers server templates for popular software stacks with all the hooks for auto-scaling already in place.

All in all, the announcement amounts to two great leaps forward for the cloud computing world:  broader OS support and a stronger business commitment for EC2.  It seems that cloud solutions get stronger and stronger with each passing quarter.  Of course, managing the increasing complexity through design, architecture and automation remains a critical ingredient in this picture — and one that continues to be our main focus at RightScale.

Comments (10)

RackSpace unveils cloud strategy

In a very exciting web broadcast RackSpace yesterday announced their cloud strategy. Their commitment to the cloud started to become visible with Mosso and then the CloudFS beta, which is a storage system looking very much like Amazon S3. They have now renamed CloudFS to CloudFiles and unveiled CloudServers, which will offer a service competing with Amazon EC2, which we will support as soon as it becomes publicly available. They are also acquiring Slicehost and JungleDisk. I haven’t used Slicehost, but I’m an avid JungleDisk user and am really happy for the momentum this will add behind JungleDisk!

Rackspace’s announcements are momentous because they come from one of the leading managed hosting providers — one with the foresight to see that cloud architectures are a critical path to the future for both them and their customers. This is more evidence of the groundswell of support that cloud computing is experiencing. With Rackspace’s long history in the hosting business and their focus on ‘fanatical support,’ we expect them to bring an interesting new combination of strategic strengths to the market. As we announced last month, RightScale is working closely with them to support their cloud suite on release.

Comments

Why Amazon’s Elastic Block Store Matters

On the technical side, Amazon’s EBS service may look like “just” another great new feature of the Elastic Compute Cloud, but on the business side it enables a whole slew of new customers. I won’t pretend that I understand all the new uses, but I can talk about those we see and are supporting.

First a couple of words about what EBS is. In short it’s a SAN (Storage Area Network) in the cloud. You can allocate a disk volume of 1GB to 1TB in size from what is now an endless SAN in the cloud and attach it to an instance of yours running in EC2. The volume is stored on redundant disks (i.e. with some form of RAID) and has a lifetime separate from any instance on which it is mounted, so you can unmount it and later remount it on another instance. You can also perform a snapshot backup of a volume to S3, where it is stored with the redundancy and durability of all objects on S3. Moreover, successive snapshots are incremental providing a very powerful and efficient incremental backup capability for volumes.

All this and much more is explained in detail in my other post and there’s yet more detailed EBS information on our support site. The official EBS announcement is on the EC2 detail page, Werner Vogels provides some background, and Jeff Barr’s blog entry has links to many other related announcements.

The RightScale dashboard supports all the features of EBS and offers a number of additional goodies such as configuring volumes to automatically be attached to servers when these launch and keeping track of the ancestry of a volume or snapshot.

What does EBS enable? In short: traditional processing on large datasets and reliable storage for many servers. Let’s look at these two areas one-by-one.

Large datasets

Amazon Web Services are designed for scale. EC2, S3, SQS, and SDB are ideally suited for building large systems that process huge data volumes. The catch has been that they are geared towards modern service oriented systems that can use storage accessed via HTTP PUTs and GETs (Amazon S3), can work using a non-relational database like Amazon SDB, and thrive on large numbers of simple servers (EC2). Users that have more traditional applications, such as relational databases, that require large datasets stored in a file system with a POSIX interface have had difficulties in meeting all their requirements for operating in AWS. While an EC2 X-large instance comes with about 1.4TB of local disk it is rather difficult to actually use this disk space in a production system. Populating the disk with data at boot time can take hours and backups, replication and restoring the data in case of an instance failure are all sore points. For up to 100GB the timescales are all workable, but beyond that it gets difficult.

With EBS the processing of large datasets contained within a file system becomes easily accessible. First of all, volumes can be up to 1TB in size and beyond that it is possible to mount multiple volumes on the same instance such that file systems of 10TB are practical. The volumes can further be backed-up to S3 using the snapshots and they can be replicated by creating new volumes from the snapshots. What is particularly nice is that a volume can be created in any availability zone (think datacenter) of a region from a snapshot, so copying a large volume across datacenters can be off-loaded to EBS and is done very efficiently.

Many virtual appliance servers

EBS also really enables SaaS vendors that use a single-tenant “virtual appliance” model. Many software vendors have approached us with use-cases where they would like to run individual servers on behalf of their customers. Often these servers are co-managed between customer and software vendor or have other properties that make the service inappropriate for multi-tenant SaaS implementation. In these use-cases the end-customer is storing important data on these servers and requires a robust data safeguarding architecture, in particular for database storage. While we today have a very effective mysql replication and backup solution, it is really geared at multi-server set-ups and doesn’t fit the price and complexity budget of cookie-cutter single-server virtual appliances. For those use-cases EBS brings the desired performance and reliability and drops the complexity and price.

With EBS the canonical reliable single-server virtual appliance can be implemented with the following architecture: an EC2 instance whose type is chosen for the cpu and memory required, an EBS volume sized appropriately for the data set, a revolving set of frequent snapshots providing disaster recovery backups, and periodic application-level “export” of backups to S3 for archiving and off-cloud backups. In case of a total failure of the EC2 instance and the EBS volume (e.g. datacenter fire) a new instance and volume can be allocated in another availability zone from the last revolving snapshot.

When it comes time to upgrade the virtual appliance to a new software version it becomes relatively easy for the software vendor to spin-up a second instance and volume with the upgraded software for important customers so they can test-drive the new version on their data and train their internal users before committing to the upgrade.

Try it out for yourself!

We’ve been busy integrating support for this new storage system for months so that you can start using it immediately. And our RightScale Dashboard support for EBS is available as part of our free Developer Edition! To learn more about EBS and RightScale’s support for it, check out my detailed technical review, read our EBS tutorials at wiki.rightscale.com, or register for our upcoming RightScale EBS Webinar. Or just drop us a line at sales@rightscale.com.

Comments (17)

Amazon’s Elastic Block Store explained

Now that Amazon’s Elastic Block Store is live I thought it’d be helpful to explain all the ins and outs as well as how to use them. The official information about EBS is found on the AWS site, I’ve written about the significance of EBS before and I’ll follow-up with a post about some new use-cases it enables.

The Basics

EBS starts out really simple: you create a volume from 1GB to 1TB in size and then you mount it on a device (like /dev/sdj) on an instance, format it, and off you go. Later you can detach it, let it sit for a while, and then reattach it to a different instance. You can also snapshot the volume at any time to S3, and if you want to restore your snapshot you can create a fresh volume from the snapshot. Sounds simple, eh? It is but the devil is in the detail!

Amazon Elastic Block Store features

Reliability

EBS volumes have redundancy built-in, which means that they will not fail if an individual drive fails or some other single failure occurs. But they are not as redundant as S3 storage which replicates data into multiple availability zones: an EBS volume lives entirely in one availability zone. This means that making snapshot backups, which are stored in S3, is important for long-term data safeguarding.

I know that folks at Amazon have thought long and hard how to characterize the reliability of EBS volumes, so here’s their explanation taken from the EC2 detail page:

Amazon EBS volumes are designed to be highly available and reliable. Amazon EBS volume data is replicated across multiple servers in an Availability Zone to prevent the loss of data from the failure of any single component. The durability of your volume depends both on the size of your volume and the percentage of the data that has changed since your last snapshot. As an example, volumes that operate with 20 GB or less of modified data since their most recent Amazon EBS snapshot can expect an annual failure rate (AFR) of between 0.1% - 0.5%, where failure refers to a complete loss of the volume. This compares with commodity hard disks that will typically fail with an AFR of around 4%, making EBS volumes 10 times more reliable than typical commodity disk drives.

From a practical point of view what this means is that you should expect the same type of reliability you get from a fully redundant RAID storage system. While it may be technically possible to increase the reliability by, for example, mirroring two EBS volumes in software on one instance, it is much more productive to rely on EBS directly. Focus your efforts on building a good snapshot strategy that ensures frequent and consistent snapshots, and build good scripts that allow you to recover from many types of failures using the snapshots and fresh instances and volumes.

Volume performance

Our performance observations are based on the pre-release EBS volumes, thus some variations on the production systems should be expected. On the one hand our pre-release tests were probably running on a small infrastructure with fewer users, but on the other hand many of these users were also running stress tests, so it’s really hard to tell how all this will carry over. Only time will tell.

EBS volumes are network attached disk storage and thus take a slice off the instance’s overall network bandwidth. The speed of light here is evidently 1GBps, which means that the peak sequential transfer rate is 120MBytes/sec. “Any number larger than that is an error in your math.” We see over 70MB/sec using sysbench on a m1.small instance, which is hot! Presumably we didn’t get much network contention from other small instances on the same host when running the benchmarks. For random access we’ve seen over 1000 I/O ops/sec, but it’s much more difficult to benchmark those types of workloads. The bottom line though is that performance exceeds what we’ve seen for filesystems striped across the four local drives of x-large instances.

With EBS it is possible to increase the I/O transaction rate further by mounting multiple EBS volumes on one instance and striping filesystems across them. For streaming performance this doesn’t seem worthwhile as the limit of the available instance network bandwidth is already reached with one volume, but it can increase the performance of random workloads as more heads can be seeking at a time.

Snapshot backups

Snapshot backups are simultaneously the most useful and the most difficult to understand feature of EBS. Let me try to explain. A snapshot of an EBS volume can be taken at any time, it causes a copy of the data in the volume to be written to S3 where it is stored redundantly in multiple availability zones (like all data in S3). The first peculiarity is that snapshots do not appear in your S3 buckets, thus you can’t access them using the standard S3 API. You can only list the snapshots using the EC2 API and you can restore a snapshot by creating a new volume from it. The second peculiarity is that snapshots are incremental, which means that in order to create a subsequent snapshot, EBS only saves the disk blocks that have changed since previous snapshots to S3.

How the incremental snapshots work conceptually is depicted in the diagram below. Each volume is divided up into blocks. When the first snapshot of a volume is taken all blocks of the volume that have ever been written are copied to S3, and then a snapshot table of contents is written to S3 that lists all these blocks. Now, when the second snapshot is taken of the same volume only the blocks that have changed since the first snapshot are copied to S3. The table of contents for the second snapshot is then written to S3 and lists all the blocks on S3 that belong to the snapshot. Some are shared with the first snapshot, some are new. The third snapshot is created similarly and can contain blocks copied to S3 for the first, second and third snapshots.

Illustration of EBS snapshots to show incremental storage of a snapshots block in Amazon S3

There are two nice things about the incremental nature of the snapshots: it saves time and space. Taking subsequent snapshots can be very fast because only changed blocks need to be sent to S3, and it saves space because you’re only paying for the storage in S3 of the incremental blocks. What is difficult to answer is how much space a snapshot uses. Or, to put it differently, how much space would be saved if a snapshot were deleted. If you delete a snapshot, only the blocks that are only used by that snapshot (i.e. are only referenced by that snapshot’s table of contents) are deleted.

Something to be very careful about with snapshots is consistency. A snapshot is taken at a precise moment in time even though the blocks may trickle out to S3 over many minutes. But in most situations you will really want to control what’s on disk vs. what’s in-flight at the moment of the snapshot. This is particularly important when using a database. We recommend you freeze the database, freeze the file system, take the snapshot, then unfreeze everything. At the file system level we’ve been using xfs for all the large local drives and EBS volumes because it’s fast to format and supports freezing. Thus when taking a snapshot we perform an xfs freeze, take the snapshot, and unfreeze. When running mysql we also “flush all tables with read lock” to briefly halt writes. All this ensures that the snapshot doesn’t contain partial updates that need to be recovered when the snapshot is mounted. It’s like USB dongles: if you pull the dongle out while it’s being written to “your mileage may vary” when you plug it back into another machine…

Snapshot performance appears to be pretty much gated by the performance of S3, which is around 20MBytes/sec for a single stream. The three big bonuses here are that the snapshot is incremental, that the data is compressed, and that all this is performed in the background by EBS without affecting the instance on which the volume is mounted much. Obviously the data needs to come off the disks, so there is some contention to be expected, but compared to having to do the transfer from disk through the instance to S3 it is like night and day.

Availability Zones

EBS volumes can only be mounted on an instance in the same availability zone, which makes sense when you think of availability zones as being equivalent to datacenters. It would probably be technically possible to mount volumes across zones, but from a network latency and bandwidth point of view it doesn’t make much sense.

The way you get a volume’s data from one zone into another is through a snapshot: You snapshot one volume and then immediately create a new volume in a different zone from the snapshot. We have really gotten away from the idea that we’re unmounting a volume from one instance and then remount it on the next one: we always go through a snapshot for a variety of reasons. The way we think and operate is as follows:

  • You create a volume, mount it on an instance, format it, and write some data to it.
  • Then you periodically snapshot the volume for backup purposes.
  • If you don’t need the instance anymore, you may terminate it and, after unmounting the volume you always take a final snapshot. If the instance crashes instead of properly terminating, you also always take a final snapshot of the volume as it was left.
  • When you launch a new instance on which you want the same data, you create a fresh volume from your snapshot of choice. This may be the last snapshot, but it could also be a prior one if it turns out that the last one is corrupt (e.g. in the case of an instance crash or of some software failure).

By creating a volume from the snapshot you achieve two things: one, you are independent of the availability zone of the original volume, and second, you have a repeatable process in case mounting the volume fails, which can easily happen especially if the unmount wasn’t clean.

Now, of course, in some situations you can directly remount the original volume instead of creating a new volume from a snapshot as an optimization. This applies if the new instance is in the same availability zone, the volume corresponds to the snapshot that we’d like to mount, and the volume is guaranteed not to have been modified since (e.g. by a failed prior mount). The best is to think of the volume as a high-speed cache for the snapshot.

Price

Estimating the costs of EBS is really quite tricky. The easy part is the storage cost of $0.10 per GB per month. Once you create a volume of a certain size you’ll see the charge. The $0.10 per million I/O transactions are much harder to estimate. To get a rough estimate you can look at /proc/diskstats on your servers. This will include something like this:

   8  160 sdk 9847 77 311900 56570 1912664 3312437 160672914 211993229 0 1597261 212049797
   8  176 sdl 333 86 4561 1538 895 51 19002 20131 0 4043 21669

which is just a pile of numbers. Following the explanation for the columns you should sum the first number (reads completed) and the fifth number (writes completed) to arrive at the number of I/O transactions (9847+1912664 for /dev/sdk above). This is not 100% accurate but should be close (I believe subtracting the 2nd and 6th numbers gets you closer yet, but I prefer an over-estimate). As a point of reference, our main database server is pretty busy and chugs along at an average of 17 transactions per second, which should total to around $4.40 per month. But our monitoring servers, prior to some recent optimizations, hammered the disks as fast as they would go at over 1000 random writes per second sustained 24×7. That would end up costing over $250 per month! As far as I can tell, for most situations the EBS transaction costs will be in the noise, but you can make it expensive if you’re not careful.

The cost of snapshots is harder to estimate due to their incremental nature. First of all, only the blocks written are captured on S3 (i.e. blocks on the volume that have never been written are not stored on S3). Second it’s tricky to talk about the cost of a snapshot due to their incremental sharing.

Summing it up

All in all it’s amazing how simple EBS is, yet how complex a universe of options it opens. Between snapshots, availability zones, pricing, and performance there are many options to consider and a lot of automation to provide. Of course at RightScale we’re busy working out a lot of these for you, but beyond that it is not an overstatement to say that Amazon’s Elastic Block Store brings cloud computing to a whole new level. I’ll repeat what I’ve said before: if you’re using traditional forms of hosting it’s gonna get pretty darn hard for you to keep up with the cloud, and you’ve probably already fallen behind at this point!

Comments (34)

Cloud Computing wouldn’t exist without Open Source

I’m at OSCON this week drinking from the open source that made RightScale possible. In talking to Tim O’Reilly I noticed that he hadn’t realized how integral Open Source is to the cloud. So maybe this isn’t as obvious as I thought and worth writing a blog entry about.

Cloud Computing is all about the flexibility to launch and terminate servers on demand, or more generally, to acquire and release resources on demand. This can help solve many tricky problems, from reliability, scaling, development, testing, to business flexibility needs. Where open source comes into the picture is when you think about the software licenses for the software stacks you’re running on all the servers you’re launching. If you are normally running 2 servers but today you need 10 did you consider whether you have licenses for all the software on the additional 8 servers? Most commercial software seems to be licensed by the server or by the cpu, and obviously this just doesn’t cut it in the cloud. If it weren’t for open source stacks no production service would be operating in the cloud today; everyone would still be waiting for software vendors to ‘get it’ and change their licenses to enable efficient use in the cloud (yeah, right…).

But all this is starting to change. The vast majority of software vendors we talk to are in the process of trying to figure out how they can sell their software in the cloud. What technical changes are necessary to enable their customers to deploy their software into the cloud environment and what business model changes are necessary to offer frictionless sales into the cloud. Of course deploying software on the RightScale platform offers a number of benefits, including some new features we’re currently adding to support publishing and charging by the use. But the bottom line really is that without open source we wouldn’t have cloud computing today.

Comments (7)

Cloud Computing vs. Grid Computing

Recently Rich Wolski (UCSB Eucalyptus project) and I were discussing grid computing vs. cloud computing. An observation he made makes a lot of sense to me. Since he doesn’t blog [...], let me repeat here what he said. Grid computing has been used in environments where users make few but large allocation requests. For example, a lab may have a 1000 node cluster and users make allocations for all 1000, or 500, or 200, etc. So only a few of these allocations can be serviced at a time and others need to be scheduled for when resources are released. This results in sophisticated batch job scheduling algorithms of parallel computations.

Cloud computing really is about lots of small allocation requests. The Amazon EC2 accounts are limited to 20 servers each by default and lots and lots of users allocate up to 20 servers out of the pool of many thousands of servers at Amazon. The allocations are real-time and in fact there is no provision for queueing allocations until someone else releases resources. This is a completely different resource allocation paradigm, a completely different usage pattern, and all this results in completely different method of using compute resources.

I always come back to this distinction between cloud and grid computing when people talk about “in-house clouds.” It’s easy to say “ah, we’ll just run some cloud management software on a bunch of machines,” but it’s a completely different matter to uphold the premise of real-time resource availability. If you fail to provide resources when they are needed, the whole paradigm falls apart and users will start hoarding servers, allocating for peak usage instead of current usage, and so forth.

Comments (13)

« Previous entries