What ProxySQL solves for us and how?
For more detailed info about, you may visit https://proxysql.com/ (opens new window)
ProxySQL lets us automate and secure the access to our DB instances.
It's design and the multi layer configuration system (opens new window)
allow us to easily automate and customize the ProxySQL instances deployment and configuration via terraform (opens new window) and other automation tools.
# Problem
As our databases grew, the administration and security began to cause issues, especially in terms of time. We had thousands of DB users across the whole infrastructure, and these users had several IPs whitelisted. Then, we started looking for a proxy service that could allow us to manage all these users and have total control over them.
And we found ProxySQL.
# What is ProxySQL, and how does it help us?
It is a high-performance SQL proxy. ProxySQL accepts incoming traffic and forwards it to backend MySQL / MariaDB / Percona servers. In case of any failure in the servers, it routes the traffic to another highly available server. Thus ProxySQL ensures that there is no single point of failure.
ProxySQL mainly helps us to have high availability, manage DB users and increase the security of our data. And we have all automated π
Yes, it is open source and free.
# Let's go deeper π
We have incoming traffic from the development team (personal DB users), multiple projects/applications, data team and various third party analytical tools, like Tableau (opens new window) and warehouse solutions as Amazon Redshift (opens new window). As commented before, we ran into the problem of having thousands of user accounts to manage.
So, how do we solve this?
ProxySQL allows us to have very granular DB accounts without having multiple IP ranges whitelisted in the database as all traffic goes through the proxySQL node. Then we open (or close) the ProxySQL listening ports for the IP ranges at the ProxySQL instance firewall / security group.
Easy, fast and with minimal maintenance.
- AWS We have proxySQL service installed on an EC2 instance (Jumpserver) along with multiple DB tools and scripts that help the DBA team to have a better and easier management of the databases. Those EC2 instances are completely automated using Terrafom, Ansible, Packer and CircleCi.
- Checks if a remote file system is mounted. If not, creates the config files and mounts it.
- Checks the connectivity to the RDS.
- Checks the ProxySQL configuration.
- Syncs the DB users from the RDS to the ProxySQL.
Once a DB users is dropped from the DB backend, they are automatically dropped from ProxySQL too. - Checks the monitoring agent configuration
- Moves all the changes to the ProxySQL config file, and creates an encrypted backup stored at the remote files system
- On premise
- We have two ProxySQL instances running in Production. Both installed along with master-to-master MariaDB instances in the same servers.
- Active master balancing SQL query read load through the standby master and DB replicas and write load to the active master
- Stanby master balancing SQL query read load through the standby master and DB replicas.
- Three DB replicas balancing SQL query read load through them depending on the traffic.
A ProxySQL bash script created by the DBA team is installed during the EC2 instance creation and does the following:
ProxySQL allows us to create a high availability DB endpoints.
It's main job is, according to the https://proxysql.com/ (opens new window) is:
Intelligently serving multiple frontend connections using a single backend connection.
And does it very well. Mixed with AWS or HAproxy network balancers it offers the perfect HA Database solution.
# Security π
All of the following is done automatically.
- None has direct access to the DB/RDS instance except the DBA team and client DB connections are over SSL. The authentication and the SSL offloading are done at the ProxySQL side.
- The personal DB users activity is monitored using the *dynamic ProxySQL rules
- We set rules to deny particular SQL queries to be executed, for example:
- ProxySQL config backups retention period with minimal disk usage
DELETE FROM TABLE ...;
ALTER TABLE ...;
In this way we achieve easy, fast and secure management of the DB access in staging and production environments.
# ProxySQL HA AWS based setup
The following is one of our current projects that uses ProxySQL with High Availability solution
It is based on a couple of ec2 ProxySQL instances in an Autoscaling Group, plus one additional identical instance NOT in the autoscaling group.
# To summarize:
- ProxySQL gives us the ability to offer different DB sources.
- We can scale up and down the DB backend resources when needed, and all transparent for the DB clients.
- ProxySQL allow us to use much more efficiently our DB instances, in the cloud and on premise.
- ProxySQL query rules make possible to route the SQL traffic in real time.
- We set rules that make sure the DB client always get to the right DB source.
- It allow us to save money and time. We dropped some huge RDS instances, as the ProxySQL usage rendered them not needed.
# Future:
We are waiting on Implement SESSION_TRACK_GTIDS exactly (opens new window) and on GTID consistent reads not working (opens new window) for future implementation in production of the ProxySQL - MySQL Binlog Reader (opens new window) and GTID in a view to start adaptive query routing based on GTID tracking (opens new window)
We are working on failover procedures and HA DB setup based on ProxySQL that will work in case of entire AWS zone is down, as it happened on September 26, 2021 (opens new window) or on December 7, 2021 (opens new window)