RightScale Blog

Cloud Management Blog
RightScale 2015 State of the Cloud Report
Cloud Management Blog

Thoughts About the Amazon Simple Notification Service

Thoughts About the Amazon Simple Notification Service

Amazon just released a new service, the Simple Notification Service (SNS), which is a distributed message delivery service roughly similar to AMQP or JMS message services. It uses a publish-subscribe paradigm and supports push delivery of notifications using HTTP and email. It seems mostly targeted at back-end applications, such as servers sending notifications to one another, but given that is has an email delivery mechanism it can also be used to deliver notifications to users.

Jeff Barr wrote an article describing SNS. The key points are:

  • You can create topics and publish messages to these topics.
  • Others can subscribe to the topics and they will get messages pushed to them.
  • Messages can be pushed over HTTP or SMTP (email).
  • Using access control policies one can control who is allowed to subscribe to a topic.
  • The SNS system is redundant and retries message delivery if necessary.
  • The cost at volume is $0.06 per 100,000 messages published and $0.06 per 100,000 HTTP message pushes. Email pushes cost 33x more. Good news is the first 100,000 messages published and pushed are free.

It is great that AWS provides such a service. It's relatively easy to fire up a messaging server, such as RabbitMQ, but it's a different story to set up a redundant scalable messaging system. While this can be done with RabbitMQ, for many users having this provided as a service is the right way to go.

Unfortunately SNS does not use a standard messaging API - it's all proprietary. This is a major weakness of SQS, SNS, and SDB: Once you use the interface you're locked in to using AWS. Granted, the SNS interface isn't particularly big, but then why did they have to roll their own?

My biggest beef with SNS is what is being said, or more precisely, not being said about reliability. I have no reason to believe that SNS doesn't do all the right things, but AWS isn't very forthcoming with specifics. Here is what the SNS docs state:

  • "Reliable – Amazon SNS runs within Amazon’s proven network infrastructure and datacenters, so topics will be available whenever applications need them. To prevent messages from being lost, all messages published to Amazon SNS are stored redundantly across multiple servers and data centers." source
  • "Although most of the time each message will be delivered to your application exactly once, the distributed nature of Amazon SNS and transient network conditions could result in occasional, duplicate messages at the subscriber end." source

So there is talk about redundant storage, at least once delivery, and delivery retries. But what I'd really want to know is not all this fuzzy feel-good stuff. The question is not that difficult:

  • If SNS returns an HTTP "200 OK" to my publish request, what is the probability that each subscriber will receive at least one delivery attempt?

OK, I guess I really need to factor in time, which would also give an indication of performance:

  • If SNS returns an HTTP "200 OK" to my publish request, what is the probability distribution over time that each subscriber has received at least one delivery attempt?

This would let me reason about what I can use SNS for and what not, or whether I need a backup synchronization mechanism. If the story stays at the warm and fuzzy level, AWS could at least specify when messages are stored redundantly, and whether a redundant copy is stored by the time I get an HTTP "200 OK" response, and specifics about how long and how often retries are made. (I'm focusing on the HTTP delivery; I don't think it makes much sense talking about email delivery reliability.)

I hope that many people will ask AWS to be more specific about the SLA offered by SNS (and some of the other AWS services). I'm not asking for damages if the SLA isn't met, I just want to know what AWS is publicly holding itself accountable for and thus what I can design for and apply my "trust in AWS" judgement to. While the service is in beta, the SLA might be a target.