Monday, December 3, 2012

Part 1: Understanding Amazon ElastiCache Internals: Connection overheads



Amazon ElastiCache node (MemCached engine) uses a connection buffer per TCP connection to read/write data out over the network. This is not a problem when there are few concurrent connections to an Amazon ElastiCache Node, whereas when you get into hundreds of thousands of connections with hundreds of Amazon ElastiCache nodes, this adds up to connection memory overheads. Example: On each Amazon ElastiCache node, the memory made available for storing cache items is the total available memory on that cache node minus the memory used for connections and other overhead (like TCP connection buffer). This overhead value can be configured using the memcached_connections_overhead parameter in Amazon ElastiCache. For example, a cache node of type cache.m3.2xLarge has a max_cache_memory of 29600 MB. With the default memcached_connections_overhead value of 100 MB, the Memcached process will have 29500 MB available to store cache items. The default value for the memcached_connections_overhead parameter of 100 MB will satisfy most use cases; however, the required amount of allocation for connection overhead can vary depending on multiple factors, including request rate, payload size, and the number of connections. For cache heavy dependent site with high concurrency using multiple nodes of cache.m3.2xlarge instance, an overhead size of just 100 MB might not withstand sometimes and may cause swapping and degrade performance. That’s why Amazon ElastiCache has made this overhead a user configurable property. The configuration change will affect all cache nodes in the cluster. You need to monitor the swap, latency and number of concurrent requests using Amazon CloudWatch periodically and accordingly increase this parameter size (it could be few hundreds of MB increase to GB depending upon the usage readings).

Instead of wasting memory on Connection buffers and overheads it could be better if it is used to store user data. To reclaim this memory for user data some techniques that are employed in the industry:
·                     Implement a per-thread shared connection buffer pool for TCP and UDP sockets. This change can enable you to reclaim multiple gigabytes of memory per server. A patch was provided by Facebook engineering team in their engineering blog on per-thread shared connection buffer pool, we need to check with AWS ElastiCache team whether this patch is applied on or any equivalent tuning is done on Amazon ElastiCache Nodes for reducing this overhead.
·                     Use UDP wherever applicable and feasible. Currently UDP is not supported by AWS ElastiCache. In future it could be a possibility.

1 comment:

Anonymous said...

Thanks for your the article.It's very good.

Need Consulting help ?

Name

Email *

Message *

DISCLAIMER
All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.

Followers