PallyCon DR(Disaster Recovery) System

Written by

Published on

Introducing PallyCon DR(Disaster Recover) system

The stability of online services based on cloud platforms has emerged as a major issue due to the massive failure of the AWS Seoul region in November last year. Traditionally, high-availability (HA) systems have been applied to redundant systems within a region to prevent failures in some systems. However it is impossible to cope with the problems that arise in a whole region with HA systems. Therefore, there is a need for multi-region DR system.This article introduces the application of the multi-regional DR system to PallyCon cloud service to automatically address large-scale failures of cloud platform and minimize damage.

 

Introducing PallyCon DR System

 

PallyCon DR system uses AWS Seoul region as the main system in normal condition. When it detects a failure of the main system through the health check function of Seoul region, it automatically switches the service to the backup system in Tokyo region.

[siteorigin_widget class=”SiteOrigin_Widget_Image_Widget”][/siteorigin_widget]

Amazon Route53 periodically checks the service status of the AWS Seoul region to convert the service DNS to the Tokyo region in the event of a failure.

 

Health Check

PallyCon DR System Health check function
Cycle 30 seconds (minimum 10 seconds possible)
Region Seoul, Tokyo
Method Check whether the database connection state of the region is normal through a specific API such as DRM license request URL
Failover condition If a service failure is continuously detected for 3 minutes, it will be switched to Tokyo region. Then, if the disruption of Seoul region is recovered and the normal state of service is continuously detected for 3 minutes, it returns to Seoul region again.
 

 

Alarming

Health Check results are stored in Amazon CloudWatch, and in conjunction with CloudWatch’s SNS Alarm function, administrators are notified about disaster recovery processing.

 

DR Server Architecture and Restrictions

The database used by PallyCon service is replicated in real-time with a cross-region replica. When the service is running in Tokyo region due to a fault, it is possible to inquire existing information and issue licenses in a ‘Read Only’ state. This backup system minimizes the impact of the regional failure on PallyCon’s customers.

 

However, it is not possible to write new data such as content packaging info during the failure, because processing multi-master in the inter-regional database is not supported.

 

The backup system in the Tokyo region basically runs one instance of each major servers, but it can be expanded automatically by auto-scaling depending on the traffic.

Resources for Effective Security

효과적인 보안을 위한 리소스

아직 망설여지시나요?
강력한 보안 솔루션을 직접
경험해 보세요!

Still not convinced? Experience our powerful solutions for yourself.

Scroll to Top