Saturday, August 3, 2013

Load Balancing Amazon RDS Read Replica's using HAProxy


When you are architecting a read intensive online application in AWS cloud you can employ techniques like CDN, Caching etc to improve the overall concurrency and performance of the application. One of the age old techniques applied is also scaling out the Database Read Slaves. To solve this need, Amazon RDS MySQL has a concept of Master and Read Replica Slaves. Depending upon the read intensity and concurrency needed, you can scale out and add more read replica slaves to the Master RDS MySQL. Ideally 1 to 5 Read replica slaves can be placed with RDS MySQL master for performance. If more than 5 Read Replica Slaves are required, it surely sounds like a bad design because of the load it puts on the master, replication lag and overall manageability of this tier itself. I would suggest you need to functionally partition your database or use some other high performance/scale out data stores like caching, MongoDB, DynamoDB, Redis etc in your architecture to over come this. 
Now let us define a common architecture deployment pattern for read intensive site:

  • Entire setup is inside Amazon VPC
  • Your Web App is deployed in Amazon Auto Scaling Mode.
  • You have 2 or more RDS Read replica slaves with your RDS Master. This is good, now the question is how do you load balance requests between your Read replica's, What happens when you elastically scale out new RDS Read replica's ? 

There are multiple architecture techniques that can be followed to solve this problem from embedding Load balancing plugins in PHP, to introducing HAProxy in between etc.
In this post, let us explore how HAProxy can be used to solve this problem in Amazon Cloud.

Architecture 1:  HAProxy as a Separate Tier

  • Web/App EC2 instances are deployed in the public subnet of Amazon VPC in Auto Scaling mode.
  • RDS MySQL Master and 2 Read replica's are deployed in Multiple Subnet - Multi - AZ mode.
  • Programmatic changes, plugins or some ideal mechanism is engineered in the Web/App to separate the Writes and Reads DB.All writes go to Master and reads goes to RDS Read Replica's.
  • Web/App EC2 instances are pointed to HAProxy EC2 address. HAProxy Load balancer is provisioned in a separate tier to load balance internal requests from the Auto Scaled Web/App EC2 instances to the RDS Read Replica EC2's. 
  • Two or more HAProxies are deployed to avoid Single Point of Failure in this tier. 
Below diagram illustrates this architecture technique:
  
Now let us analyse this architecture technique:

  • This is a widely used technique in AWS cloud environment for such problems. You can use this RDS MySQL or MySQL on EC2 as well.
  • New RDS Read replica's can be added and removed elastically depending upon the traffic without modifying the app configuration files. HAProxy configuration entries can be hot deployed
  • Minimum of 2 HAProxy EC2 instances are needed to avoid Single Point of failure in this tier
  • HAProxy can be deployed in Multiple-AZ and Multiple Subnet architecture for better HA 
  • It is recommended to start with m1.large for HAProxy EC2 instances and scale up instance type depending upon traffic/concurrency. Note: m1.small/medium etc have moderate IO bandwidth and may degrade performance between Read replica's and App Tier.
  • Frequent Scale up of HAProxy EC2 to higher instance type might be needed in case hundreds of Web/App EC2 's are auto scaled every day
  • Logic has to be built in App tier to redirect traffic to secondary HAProxy in event of primary HAProxy failure
  • In case both HAProxies are used actively, then logic has to be built in App tier to use both them 
  • Additional price to be paid for 2 or more m1.large HAProxy EC2 instances
  • Additional cost of monitoring and managing this HAProxy Tier

Architecture 2: HAProxy is embedded 

  • Web/App EC2 instances are deployed in the public subnet of Amazon VPC in Auto Scaling mode.
  • RDS MySQL Master and 2 Read replica's are deployed in Multiple Subnet - Multi - AZ mode. 
  • HAProxy is installed/bundled with every Auto Scaled Web/App EC2.
  • Every Web/App EC2 instance is pointed to the local HAProxy itself.  HAProxy will load balance requests from that Auto Scaled Web/App EC2 instances to any of the RDS Read Replica EC2's. 

   
Now let us analyse this architecture technique:
  • This is not widely used as the previous one by many users, probably because not many would have thought/implemented on these lines. But i found this simple and manageable in larger AWS production deployments.
  • New RDS Read replica's can be added and removed elastically depending upon the traffic without modifying the app configuration files. Read Replica endpoints can be propagated to HAProxy using Chef and then hot deployed in HAProxy
  • No single point of failure, if your web/app EC2 instance fails your HAProxy also fails. HAProxy rarely fails individually and is very stable.
  • No additional HAProxy EC2 instances are needed - hence lower cost and ease of manageability.
  • Found this embedded technique really useful in larger, auto scaled AWS production deployments
  • This technique gives more performance when you use larger Web/App EC2 instances like m1.xlarge/ C1.Xlarge etc. HAProxy uses very less CPU and memory and utilizes large IO band with coming with larger instance types. When you have designed your Web/App EC2 with smaller ec2 like medium/small, this is not suggested because of the resource contention
  • Lesser response latency because lesser NW trip
  • No Scale up of HAProxy required. HAProxy is very light weight and super stable process. It can easily scale the requests with your applications need in the embedded model.
  • No complex logic has to be built in web/App tier. they simply contact the HAproxy and it does the rest.


Sample Configuration Steps for Architecture Technique -2:

Setup Details:

  • Web/App EC2 (Amazon Linux) : 2 . (Can be running in amazon auto scaling as well in production) 
  • RDS MySQL Master DB Instance :1
  • RDS MySQL Read Replicas: 2 - 5. (Use Larger EC2 instance types for production purpose)
  • HAProxy will be running on each Web/App EC2
  • Versions,Instance type and configurations used below are strictly for illustrative purposes only. Note: For production use some modifications might be needed. 

Step 1: Creating Read Replicas:
Create two Read Replicas from the RDS MySQL Master DB instance. To create MySQL Read replica navigate to the dashboard of Amazon RDS, select the Amazon RDS MySQL Master and use the option of “Create Read Replica”. On successful creation, you will get endpoint for each of the Read Replica slaves. The below screenshots illustrates the same.






Step 2: Installing HAProxy on Web/App EC2
Installing HAProxy can be done from the source or from the repository. We have installed it from the repository. To install HAProxy from the repository and start it use the following commands,

#yum install haproxy. 
#service haproxy start.

Step 3: HAProxy Configuration on Single Web/App EC2
The configuration file for HAProxy will be available in the following location
 /etc/haproxy/haproxy.cfg
In the configuration file there are many sections like global, default, listen. In each section you may need to specify some parameters and values.
In listen section ,specify port for the RDS MySQL(3306) and user for the mysql-check. "mysql-check" is used to check the health status of the back end read replica nodes. In order for health check to work create an user on RDS MySQL master with no password and use it for the mysql-check user option.This detail will be automatically propogated to read replica's as well. The Load balance algorithm used here is Round Robin. 

Sample configuration file:
##/etc/haproxy/haproxy.cfg##

global

log         127.0.0.1 local2 debug
chroot      /var/lib/haproxy
pidfile      /var/run/haproxy.pid
maxconn     4000
daemon

defaults
mode        tcp
log         global
option tcplog
timeout connect 10000 # default 10 second time out if a backend is not found
timeout client 300000
timeout server 300000
maxconn     20000

# For Admin GUI
listen stats
bind :8080
mode http
stats enable
stats uri /stats

listen mysql *:3306
mode tcp
balance roundrobin
option mysql-check user check
option log-health-checks
server db01 sample-r1.XXXX.amzonaws.com:3306 check port 3306 inter 1000
server db02 sample-r2.XXXX.amazonaws.com:3306 check port 3306 inter 1000

Use the following to create user on master for health check.
use mysql;
create user check;
insert into user (Host,User) values ('<IP/RANGE_OF_HAPROXIES>','check');
FLUSH PRIVILEGES;

flush hosts;

For Production use in Amazon Web Services the HAProxy configuration file and setup can be propagated using Chef.

The Web/App process must be configured to use HAProxy for the MySQL read connections.Once the setup is running, In the admin page of HAProxy you can see the distribution of sessions equally in round robin fashion.
Admin URL Page:


2 comments:

Sanket Anavkar said...

Great Post !!
In the second architecture, do I still have to manage the 'write only to master' in my application logic ?

Bibin Wilson said...

Really a good post!!

Need Consulting help ?

Name

Email *

Message *

DISCLAIMER
All posts, comments, views expressed in this blog are my own and does not represent the positions or views of my past, present or future employers. The intention of this blog is to share my experience and views. Content is subject to change without any notice. While I would do my best to quote the original author or copyright owners wherever I reference them, if you find any of the content / images violating copyright, please let me know and I will act upon it immediately. Lastly, I encourage you to share the content of this blog in general with other online communities for non-commercial and educational purposes.

Followers