Skip to Content

Scaling Ceph RGW

The Power of Keepalived for High Availability
July 3, 2026 by
Tinihub Inc.

Scaling Ceph RGW: The Power of Keepalived for High Availability

When deploying Ceph Object Gateway (RGW), the gateway itself is essentially a stateless proxy. While you can scale horizontally by adding more RGW instances, you face a critical challenge: How do you present a single, reliable endpoint to your clients?

If your clients are hardcoded to a single RGW IP address and that instance goes down, your storage becomes inaccessible. This is where Keepalived becomes an essential component of your infrastructure.

The Architecture: Keepalived + RGW

Keepalived implements the Virtual Router Redundancy Protocol (VRRP). It allows you to configure a Virtual IP (VIP) that floats between your physical RGW nodes.

If the primary node hosting the VIP fails, Keepalived automatically detects the outage and shifts the VIP to a standby node within milliseconds.

Benefits: With vs. Without Keepalived

FeatureWithout KeepalivedWith Keepalived
AvailabilitySingle point of failure.High availability via failover.
Client ConfigurationHardcoded to specific nodes.Points to a single, stable VIP.
MaintenanceRequires client-side changes.Transparent; move VIP to perform updates.
ComplexityLow initially, high during outage.Moderate setup, high operational resilience.

Implementation Example

In this setup, we assume two nodes (node-1 and node-2) both running RGW. We want a shared VIP: 192.168.1.100.

Keepalived Configuration (keepalived.conf)

Place this file in /etc/keepalived/keepalived.conf on both nodes.

Note: Ensure you adjust the priority (higher for master) and the interface name.

Why this is the "Gold Standard" for Ceph RGW

  1. Seamless Failover: Because the VIP persists, clients (S3 browsers, SDKs, backup tools) do not need to be reconfigured or updated when a node needs a reboot or suffers a kernel panic.

  2. Health Awareness: By using the vrrp_script shown above, Keepalived doesn't just check if the server is alive; it checks if the RGW service is actually running. If RGW crashes but the OS stays up, the VIP will still migrate, ensuring traffic stays directed toward a working gateway.

  3. Cost-Effective: Unlike hardware load balancers (F5/Citrix) which can be incredibly expensive, Keepalived is open-source, lightweight, and runs directly on your existing RGW Linux nodes.

Final Considerations

  • Load Balancing: Keepalived handles high availability, but it does not perform load balancing across all RGWs. If you have 10+ RGW nodes, consider putting HAProxy or Nginx in front of your RGWs, and have Keepalived manage the VIP for those load balancers instead.

  • Networking: Ensure that your network environment allows gratuitous ARP, which is how Keepalived communicates the VIP movement to the rest of the network.

Are you looking to integrate this with a specific load balancer like HAProxy, or are you aiming for a simple two-node active-passive setup?

Tinihub Inc. July 3, 2026
Share this post
Archive