Spring cloud优雅关闭的讨论

线上项目需求不停迭代,在版本升级过程中我们要避免请求的失败,整体微服务平滑的升级。我认为一个比较优雅的停机过程应该是,服务在注册中心下线->容器避免接受新请求->等待当前请求持续处理完毕->销毁bean等其余收尾工作。下面我将阐述下较为合理的优雅关机的实现方案。

Kill命令

日常我们停止一个进程的方法无非是采用kill的方式,我们来看下kill的解释。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
KILL(1)                                                                                                        User Commands                                                                                                       KILL(1)

NAME
kill - terminate a process

SYNOPSIS
kill [-s signal|-p] [-q sigval] [-a] [--] pid...
kill -l [signal]

DESCRIPTION
The command kill sends the specified signal to the specified process or process group. If no signal is specified, the TERM signal is sent. The TERM signal will kill processes which do not catch this signal. For other pro‐
cesses, it may be necessary to use the KILL (9) signal, since this signal cannot be caught.

Most modern shells have a builtin kill function, with a usage rather similar to that of the command described here. The '-a' and '-p' options, and the possibility to specify processes by command name are a local extension.

If sig is 0, then no signal is sent, but error checking is still performed.

其实在面试的时候经常被问到进程间的通信方式有哪些,烂熟于心的 信号量、信号、管道、Socket等,那么kill的方式就是给进程传递信号来实现通信。默认情况我们直接执行kill -pid 即可,它会向进程传递 SIGTERM(15)信号。kill -l 可以查看传递的所有信号:

1
2
3
4
5
6
7
8
9
10
11
12
13
 1) SIGHUP	 2) SIGINT	 3) SIGQUIT	 4) SIGILL	 5) SIGTRAP
6) SIGABRT 7) SIGBUS 8) SIGFPE 9) SIGKILL 10) SIGUSR1
11) SIGSEGV 12) SIGUSR2 13) SIGPIPE 14) SIGALRM 15) SIGTERM
16) SIGSTKFLT 17) SIGCHLD 18) SIGCONT 19) SIGSTOP 20) SIGTSTP
21) SIGTTIN 22) SIGTTOU 23) SIGURG 24) SIGXCPU 25) SIGXFSZ
26) SIGVTALRM 27) SIGPROF 28) SIGWINCH 29) SIGIO 30) SIGPWR
31) SIGSYS 34) SIGRTMIN 35) SIGRTMIN+1 36) SIGRTMIN+2 37) SIGRTMIN+3
38) SIGRTMIN+4 39) SIGRTMIN+5 40) SIGRTMIN+6 41) SIGRTMIN+7 42) SIGRTMIN+8
43) SIGRTMIN+9 44) SIGRTMIN+10 45) SIGRTMIN+11 46) SIGRTMIN+12 47) SIGRTMIN+13
48) SIGRTMIN+14 49) SIGRTMIN+15 50) SIGRTMAX-14 51) SIGRTMAX-13 52) SIGRTMAX-12
53) SIGRTMAX-11 54) SIGRTMAX-10 55) SIGRTMAX-9 56) SIGRTMAX-8 57) SIGRTMAX-7
58) SIGRTMAX-6 59) SIGRTMAX-5 60) SIGRTMAX-4 61) SIGRTMAX-3 62) SIGRTMAX-2
63) SIGRTMAX-1 64) SIGRTMAX

但是进程是可以不响应SIGTERM,当进程不处理SIGTERM的信号时,这个时候我们可能就会采用kill -9的方式强制杀死进程了。

Spring 关闭流程

优雅关机指的就是,当系统发送了SIGTERM信号后,应用程序就开始进行收尾,不影响业务的情况下完成应用的关闭。

那么我们如何得知并利用SIGTERM信息呢,1、利用底层JVM Runtime.getRuntime().addShutdownHook() 添加自己的钩子函数。2、利用Spring 现有机制,Spring会在关机时发送ContextClosedEvent给到监听器,我们只需要把关闭函数放在监听器里即可。

Spring机制实现在AbstractApplicationContext中,代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
public void registerShutdownHook() {
if (this.shutdownHook == null) {
this.shutdownHook = new Thread() {
@Override
public void run() {
synchronized (startupShutdownMonitor) {
doClose();
}
}
};
Runtime.getRuntime().addShutdownHook(this.shutdownHook);
}
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
protected void doClose() {
if (this.active.get() && this.closed.compareAndSet(false, true)) {
LiveBeansView.unregisterApplicationContext(this);
try {
publishEvent(new ContextClosedEvent(this));
}
catch (Throwable ex) {
logger.warn("Exception thrown from ApplicationListener handling ContextClosedEvent", ex);
}
// Stop all Lifecycle beans, to avoid delays during individual destruction.
if (this.lifecycleProcessor != null) {
try {
this.lifecycleProcessor.onClose();
}
catch (Throwable ex) {
logger.warn("Exception thrown from LifecycleProcessor on context close", ex);
}
}
// Destroy all cached singletons in the context's BeanFactory.
destroyBeans();
// Close the state of this context itself.
closeBeanFactory();
// Let subclasses do some final clean-up if they wish...
onClose();
this.active.set(false);
}
}

可以看到Spring在registerShutdownHook() 注册了一个调用doClose()方法的钩子,这个钩子首先会CAS的方式去更改Context的状态,然后发布ContextClosedEvent 事件 ,事件发布通过调用 publishEvent 方法:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
protected void publishEvent(Object event, @Nullable ResolvableType eventType) {
Assert.notNull(event, "Event must not be null");
if (logger.isTraceEnabled()) {
logger.trace("Publishing event in " + getDisplayName() + ": " + event);
}

// Decorate event as an ApplicationEvent if necessary
ApplicationEvent applicationEvent;
if (event instanceof ApplicationEvent) {
applicationEvent = (ApplicationEvent) event;
}
else {
applicationEvent = new PayloadApplicationEvent<>(this, event);
if (eventType == null) {
eventType = ((PayloadApplicationEvent) applicationEvent).getResolvableType();
}
}

// Multicast right now if possible - or lazily once the multicaster is initialized
if (this.earlyApplicationEvents != null) {
this.earlyApplicationEvents.add(applicationEvent);
}
else {
getApplicationEventMulticaster().multicastEvent(applicationEvent, eventType);
}

// Publish event via parent context as well...
if (this.parent != null) {
if (this.parent instanceof AbstractApplicationContext) {
((AbstractApplicationContext) this.parent).publishEvent(event, eventType);
}
else {
this.parent.publishEvent(event);
}
}
}

这里会采取getApplicationEventMulticaster().multicastEvent(applicationEvent, eventType) 方法,可以看到实现类是SimpleApplicationEventMulticaster

1
2
3
4
5
6
7
8
9
10
11
12
13
@Override
public void multicastEvent(final ApplicationEvent event, @Nullable ResolvableType eventType) {
ResolvableType type = (eventType != null ? eventType : resolveDefaultEventType(event));
for (final ApplicationListener<?> listener : getApplicationListeners(event, type)) {
Executor executor = getTaskExecutor();
if (executor != null) {
executor.execute(() -> invokeListener(listener, event));
}
else {
invokeListener(listener, event);
}
}
}

这里我们看到调用Listener的过程是可以并行的,Executor executor = getTaskExecutor() 不过我在debug过程中发现我们现在的配置情况下 executor == null, 就没去深究这个配置是哪个项产生的影响。为null就相当于是单线程顺序调度执行,那这里有个问题是是Listener的排序规则 后面会说。整体的Spring boot的流程说清楚了,看看我们怎么利用吧。

Spring Cloud服务优雅关机实现方案

Spring 官方有个长年的Issue 一直没关闭,里面提供了一种方式。我们来看下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
private static class GracefulShutdown implements TomcatConnectorCustomizer,
ApplicationListener<ContextClosedEvent> {

private static final Logger log = LoggerFactory.getLogger(GracefulShutdown.class);

private volatile Connector connector;

@Override
public void customize(Connector connector) {
this.connector = connector;
}

@Override
public void onApplicationEvent(ContextClosedEvent event) {
this.connector.pause();
Executor executor = this.connector.getProtocolHandler().getExecutor();
if (executor instanceof ThreadPoolExecutor) {
try {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
threadPoolExecutor.shutdown();
if (!threadPoolExecutor.awaitTermination(30, TimeUnit.SECONDS)) {
log.warn("Tomcat thread pool did not shut down gracefully within "
+ "30 seconds. Proceeding with forceful shutdown");
}
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}

}

具体的思想就是通过 实现TomcatConnectorCustomizer接口获取到tomcat Connector(知识点:Tomcat的体系结构),在ContextClosedEvent产生时,首先暂停connector,然后获取tomcat线程池,执行Shutdown方法(知识点:线程池Shutdown和ShutdownNow的区别),等待线程任务的结束。

我认为,优雅关机的过程应该是通知注册中心->暂停接收新请求->旧请求执行完毕->其余的清理动作。所以我觉得上面的case在spring cloud的场景下可能有点问题。我做了改写,主要变化点 1、GracefulShutdown Listerner排序在Spring Listener 首位,第一个去执行。2、接收到ContextClosedEvent后 首先去通知注册中心下线。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
public class GracefulShutdown implements TomcatConnectorCustomizer,
ApplicationListener<ContextClosedEvent>, Ordered {
/**
* 控制GracefulShutdown listener优先级是最高的
*/
private int order = Ordered.HIGHEST_PRECEDENCE;
private static final Logger logger = LoggerFactory.getLogger(GracefulShutdown.class);
private volatile Connector connector;
private EurekaClient eurekaClient;
public GracefulShutdown(EurekaClient eurekaClient){
this.eurekaClient = eurekaClient;
}
@Override
public void customize(Connector connector) {
this.connector = connector;
}
@Override
public void onApplicationEvent(ContextClosedEvent event) {
logger.info("eureka client shutting down");
eurekaClient.shutdown();
logger.info("Completed shut down eureka client");
this.connector.pause();
logger.info("connector pause");
Executor executor = this.connector.getProtocolHandler().getExecutor();
if (executor instanceof ThreadPoolExecutor) {
try {
ThreadPoolExecutor threadPoolExecutor = (ThreadPoolExecutor) executor;
threadPoolExecutor.shutdown();
logger.info("connector executor shutting down");
if (!threadPoolExecutor.awaitTermination(10, TimeUnit.SECONDS)) {
logger.warn("Tomcat thread pool did not shut down gracefully within "
+ "10 seconds. Proceeding with forceful shutdown");
}
logger.info("completed shut down connector executor");
}
catch (InterruptedException ex) {
Thread.currentThread().interrupt();
}
}
}
@Override
public int getOrder() {
return this.order;
}
}

Spring boot 2.x 的Configuration这么配即可:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
/**
* 用于优雅关闭tomcat
* @param eurekaClient
* @return
*/
@Bean
public GracefulShutdown gracefulShutdown(EurekaClient eurekaClient) {
return new GracefulShutdown(eurekaClient);
}

/**
* 用于优雅关闭tomcat
* @param gracefulShutdown
* @return
*/
@Bean
public WebServerFactoryCustomizer tomcatCustomizer(GracefulShutdown gracefulShutdown) {
return factory -> {
if (factory instanceof TomcatServletWebServerFactory) {
((TomcatServletWebServerFactory) factory)
.addConnectorCustomizers(gracefulShutdown);
}
};
}

以上。

上面遗留的小问题、知识点,有兴趣的话可以查查看~