阿里二面:使用 Nacos 做注冊中心怎么做優(yōu)雅發(fā)布?
大家好,我是君哥。
今天來聊一聊使用 Nacos 做注冊中心怎么做優(yōu)雅發(fā)布。
跟其他的注冊中心一樣,Nacos 作為注冊中心的使用如下圖:
Service Provider 啟動后注冊到 Nacos Server,Service Consumer 則從 Nacos Server 拉取服務(wù)列表,根據(jù)一定算法選擇一個 Service Provider 來發(fā)送請求。
1.優(yōu)雅要求
對于優(yōu)雅發(fā)布,要求是 Service Provider 上線(注冊到 Nacos)后,服務(wù)能夠正常地接收和處理請求,而 Service Provider 停服后,則不會再收到請求。這就有兩個要求:
- 優(yōu)雅上線:Service Provider 發(fā)布完成之前,Service Consumer 不應(yīng)該從服務(wù)列表中拉取到這個服務(wù)地址;
- 優(yōu)雅下線:Service Provider 下線后,Service Consumer 不會從服務(wù)列表中拉取到這個服務(wù)地址。
解決了這兩個問題,優(yōu)雅發(fā)布就可以做到了。
2.搭建環(huán)境
搭建環(huán)境是為了看 Nacos 日志,通過日志找到對應(yīng)的源代碼。本文搭建的環(huán)境如下圖:
2.1 啟動 provider
啟動 springboot-provider 的應(yīng)用,注冊到 Nacos,啟動日志如下:
2023-06-11 18:58:10,120 [main] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
2023-06-11 18:58:10,121 [main] [INFO] com.alibaba.nacos.client.naming - [REGISTER-SERVICE] public registering service DEFAULT_GROUP@@springboot-provider with instance: Instance{instanceId='null', ip='192.168.31.94', port=8083, weight=1.0, healthy=true, enabled=true, ephemeral=true, clusterName='DEFAULT', serviceName='null', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}}
2023-06-11 18:58:10,133 [main] [INFO] com.alibaba.cloud.nacos.registry.NacosServiceRegistry - nacos registry, DEFAULT_GROUP springboot-provider 192.168.31.94:8083 register finished
2023-06-11 18:58:10,221 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat initialized with port(s): 18082 (http)
2023-06-11 18:58:10,222 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Initializing ProtocolHandler ["http-nio-127.0.0.1-18082"]
2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardService - Starting service [Tomcat]
2023-06-11 18:58:10,223 [main] [INFO] org.apache.catalina.core.StandardEngine - Starting Servlet engine: [Apache Tomcat/9.0.21]
2023-06-11 18:58:10,239 [main] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat-1].[localhost].[/] - Initializing Spring embedded WebApplicationContext
2023-06-11 18:58:10,239 [main] [INFO] org.springframework.web.context.ContextLoader - Root WebApplicationContext: initialization completed in 99 ms
2023-06-11 18:58:10,268 [main] [INFO] org.springframework.boot.actuate.endpoint.web.EndpointLinksResolver - Exposing 22 endpoint(s) beneath base path '/actuator'
2023-06-11 18:58:10,336 [main] [INFO] org.apache.coyote.http11.Http11NioProtocol - Starting ProtocolHandler ["http-nio-127.0.0.1-18082"]
2023-06-11 18:58:10,340 [main] [INFO] org.springframework.boot.web.embedded.tomcat.TomcatWebServer - Tomcat started on port(s): 18082 (http) with context path ''
2023-06-11 18:58:10,342 [main] [INFO] boot.Application - Started Application in 7.051 seconds (JVM running for 7.874)
2023-06-11 18:58:10,358 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider.properties+DEFAULT_GROUP
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider.properties, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider-dev.properties+DEFAULT_GROUP
2023-06-11 18:58:10,359 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider-dev.properties, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.ClientWorker - [fixed-39.105.183.91_8848] [subscribe] springboot-provider+DEFAULT_GROUP
2023-06-11 18:58:10,360 [main] [INFO] com.alibaba.nacos.client.config.impl.CacheData - [fixed-39.105.183.91_8848] [add-listener] ok, tenant=, dataId=springboot-provider, group=DEFAULT_GROUP, cnt=1
2023-06-11 18:58:10,639 [RMI TCP Connection(1)-192.168.31.94] [INFO] org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
2023-06-11 18:58:10,839 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - [BEAT] adding beat: BeatInfo{port=8083, ip='192.168.31.94', weight=1.0, serviceName='DEFAULT_GROUP@@springboot-provider', cluster='DEFAULT', metadata={management.endpoints.web.base-path=/actuator, management.port=18082, preserved.register.source=SPRING_CLOUD, management.address=127.0.0.1}, scheduled=false, period=5000, stopped=false} to beat map.
2023-06-11 18:58:10,840 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - modified ips(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
2023-06-11 18:58:10,841 [com.alibaba.nacos.client.naming.updater] [INFO] com.alibaba.nacos.client.naming - current ips:(1) service: DEFAULT_GROUP@@springboot-provider@@DEFAULT -> [{"instanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider","ip":"192.168.31.94","port":8083,"weight":1.0,"healthy":true,"enabled":true,"ephemeral":true,"clusterName":"DEFAULT","serviceName":"DEFAULT_GROUP@@springboot-provider","metadata":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1"},"ipDeleteTimeout":30000,"instanceHeartBeatInterval":5000,"instanceHeartBeatTimeOut":15000}]
我們再看下 Nacos 的日志,這里看的文件 naming-server.log,日志如下:
2023-06-11 18:58:09,723 INFO Client connection 192.168.31.94:51885#true connect
2023-06-11 18:58:10,105 INFO Client change for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=1}, 192.168.31.94:8083#true
2023-06-11 18:58:18,204 INFO Client connection 192.168.31.94:60850#true disconnect, remove instances and subscribers
springboot-provider 啟動成功后,從Nacos 管理后臺可以看到下圖:
2.2 provider 下線
服務(wù)下線后,Nacos 日志如下:
2023-06-11 19:01:03,375 INFO Client connection 192.168.31.94:51885#true disconnect, remove instances and subscribers
2023-06-11 19:01:05,048 INFO [AUTO-DELETE-IP] service: Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=2}, ip: {"ip":"192.168.31.94","port":8083,"healthy":false,"cluster":"DEFAULT","extendDatum":{"management.endpoints.web.base-path":"/actuator","management.port":"18082","preserved.register.source":"SPRING_CLOUD","management.address":"127.0.0.1","customInstanceId":"192.168.31.94#8083#DEFAULT#DEFAULT_GROUP@@springboot-provider"},"lastHeartBeatTime":1686481231604,"metadataId":"192.168.31.94:8083:DEFAULT"}
2023-06-11 19:01:05,048 INFO Client remove for service Service{namespace='public', group='DEFAULT_GROUP', name='springboot-provider', ephemeral=true, revisinotallow=2}, 192.168.31.94:8083#true
2023-06-11 19:01:08,379 INFO Client connection 192.168.31.94:8083#true disconnect, remove instances and subscribers
2.3 服務(wù)調(diào)用
在 springboot-consumer 上跑一個單元測試的用例,用 FeignClient 調(diào)用下面的方法:
@FeignClient(value = "springboot-provider", configuration = FeignMultipartSupportConfig.class)
public interface FeignAsEurekaClient {
@PostMapping("/employee/save")
String saveEmployeebyName(@RequestBody Employee employee);
}
日志如下:
2023-06-11 19:15:47,694 [main] [INFO] org.springframework.test.context.transaction.TransactionContext - Began transaction (1) for test context [DefaultTestContext@5bf0d49 testClass = TestFeignAsEurekaClient, testInstance = boot.service.TestFeignAsEurekaClient@10683d9d, testMethod = testPostEmployByFeign@TestFeignAsEurekaClient, testException = [null], mergedContextConfiguration = [WebMergedContextConfiguration@5b7a5baa testClass = TestFeignAsEurekaClient, locations = '{}', classes = '{class boot.Application, class boot.Application}', contextInitializerClasses = '[]', activeProfiles = '{}', propertySourceLocations = '{}', propertySourceProperties = '{org.springframework.boot.test.context.SpringBootTestCnotallow=true, server.port=0}', contextCustomizers = set[org.springframework.boot.test.context.filter.ExcludeFilterContextCustomizer@166fa74d, org.springframework.boot.test.json.DuplicateJsonObjectContextCustomizerFactory$DuplicateJsonObjectContextCustomizer@588df31b, org.springframework.boot.test.mock.mockito.MockitoContextCustomizer@0, org.springframework.boot.test.web.client.TestRestTemplateContextCustomizer@7fad8c79, org.springframework.boot.test.autoconfigure.properties.PropertyMappingContextCustomizer@0, org.springframework.boot.test.autoconfigure.web.servlet.WebDriverContextCustomizerFactory$Customizer@10b48321], resourceBasePath = 'src/main/webapp', contextLoader = 'org.springframework.boot.test.context.SpringBootContextLoader', parent = [null]], attributes = map['org.springframework.test.context.web.ServletTestExecutionListener.activateListener' -> false]]; transaction manager [org.springframework.jdbc.datasource.DataSourceTransactionManager@693676d]; rollback [true]
2023-06-11 19:15:47,941 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2023-06-11 19:15:47,962 [main] [INFO] com.netflix.loadbalancer.BaseLoadBalancer - Client: springboot-provider instantiated a LoadBalancer: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[],Load balancer stats=Zone stats: {},Server stats: []}ServerList:null
2023-06-11 19:15:47,969 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - Using serverListUpdater PollingServerListUpdater
2023-06-11 19:15:48,064 [main] [INFO] com.netflix.config.ChainedDynamicProperty - Flipping property: springboot-provider.ribbon.ActiveConnectionsLimit to use NEXT property: niws.loadbalancer.availabilityFilteringRule.activeConnectionsLimit = 2147483647
2023-06-11 19:15:48,064 [main] [INFO] com.netflix.loadbalancer.DynamicServerListLoadBalancer - DynamicServerListLoadBalancer for client springboot-provider initialized: DynamicServerListLoadBalancer:{NFLoadBalancer:name=springboot-provider,current list of Servers=[192.168.31.94:8083],Load balancer stats=Zone stats: {unknown=[Zone:unknown; Instance count:1; Active connections count: 0; Circuit breaker tripped count: 0; Active connections per server: 0.0;]
},Server stats: [[Server:192.168.31.94:8083; Zone:UNKNOWN; Total Requests:0; Successive connection failure:0; Total blackout seconds:0; Last connection made:Thu Jan 01 08:00:00 CST 1970; First connection made: Thu Jan 01 08:00:00 CST 1970; Active Connections:0; total failure count in last (1000) msecs:0; average resp time:0.0; 90 percentile resp time:0.0; 95 percentile resp time:0.0; min resp time:0.0; max resp time:0.0; stddev resp time:0.0]
]}ServerList:com.alibaba.cloud.nacos.ribbon.NacosServerList@24d998ba
注意,這里使用了 OpenFeign,其中用到了 Ribbon 做負(fù)載均衡,那就需要考慮到 Ribbon 的刷新本地服務(wù)列表的時間,從源代碼中看,刷新周期是 30s。如下圖:
Ribbon 刷新緩存的邏輯參考下面代碼:
public synchronized void start(final UpdateAction updateAction) {
if (isActive.compareAndSet(false, true)) {
final Runnable wrapperRunnable = new Runnable() {
@Override
public void run() {
//...
}
};
scheduledFuture = getRefreshExecutor().scheduleWithFixedDelay(
wrapperRunnable,
initialDelayMs,
refreshIntervalMs,//這里定義的是30s
TimeUnit.MILLISECONDS
);
}//...
}
3.優(yōu)雅發(fā)布
前面第一節(jié)提到過,優(yōu)雅發(fā)布有兩個要求:優(yōu)雅上線和優(yōu)雅下線。
Nacos 客戶端和服務(wù)端的交互采用長輪詢的方式,服務(wù)端收到客戶端的請求后,首先會判斷服務(wù)端本地的服務(wù)列表是否跟客戶端的相比是否發(fā)生變化(比較 MD5),如果發(fā)生變化則立即通知客戶端,否則放入長輪詢隊列掛起,如果這段時間內(nèi)服務(wù)列表發(fā)生變化,則立刻通知客戶端,否則等到超時后再通知客戶端。代碼如下:
//LongPollingService.java
public void addLongPollingClient(HttpServletRequest req, HttpServletResponse rsp, Map<String, String> clientMd5Map,
int probeRequestSize) {
String str = req.getHeader(LongPollingService.LONG_POLLING_HEADER);
int delayTime = SwitchService.getSwitchInteger(SwitchService.FIXED_DELAY_TIME, 500);
// Add delay time for LoadBalance, and one response is returned 500 ms in advance to avoid client timeout.
long timeout = -1L;
if (isFixedPolling()) {
//...
} else {
timeout = Math.max(10000, Long.parseLong(str) - delayTime);//29.5s
long start = System.currentTimeMillis();
List<String> changedGroups = MD5Util.compareMd5(req, rsp, clientMd5Map);
if (changedGroups.size() > 0) {
//服務(wù)列表發(fā)生變化,直接返回給客戶端
generateResponse(req, rsp, changedGroups);
return;
} //...
}
String ip = RequestUtil.getRemoteIp(req);
//..
// Must be called by http thread, or send response.
final AsyncContext asyncContext = req.startAsync();
// AsyncContext.setTimeout() is incorrect, Control by oneself
asyncContext.setTimeout(0L);
String appName = req.getHeader(RequestUtil.CLIENT_APPNAME_HEADER);
String tag = req.getHeader("Vipserver-Tag");
//服務(wù)列表沒有發(fā)生變化,放入長輪詢隊列等待調(diào)度
ConfigExecutor.executeLongPolling(
new ClientLongPolling(asyncContext, clientMd5Map, ip, probeRequestSize, timeout, appName, tag));
}
從上面服務(wù)端源代碼可以看到,這里超時時間是 30s,其中 29.5s 用于掛起等待,0.5s 檢查服務(wù)列表是否發(fā)生變化。這里使用了長輪詢,如果服務(wù)端列表發(fā)生變化,會立刻通知客戶端,所以對優(yōu)雅發(fā)布影響非常小。
服務(wù)列表發(fā)生變化后,客戶端用單獨(dú)的線程通知監(jiān)聽的 listener,代碼如下:
public void startInternal() {
executor.schedule(() -> {
while (!executor.isShutdown() && !executor.isTerminated()) {
try {
listenExecutebell.poll(5L, TimeUnit.SECONDS);
//...
executeConfigListen();
} catch (Throwable e) {
//...
}
}
}, 0L, TimeUnit.MILLISECONDS);
}
3.1 優(yōu)雅上線
優(yōu)雅上線存在的問題主要在于 Service Provider 注冊到 Nacos 后,服務(wù)還沒有完成初始化,請求已經(jīng)到來。這種情況主要原因是 Service Provider 啟動后立刻注冊 Naocs,而本身提供的接口可能還沒有初始化完成。
這種情況的解決方法是關(guān)閉自動注冊:
spring.cloud.nacos.discovery.registerEnabled=false
在服務(wù)初始化后使用代碼手動注冊,代碼如下:
Properties setting8 = new Properties();
String serverIp8 = "127.0.0.1:8848";
setting8.put(PropertyKeyConst.SERVER_ADDR, serverIp8);
setting8.put(PropertyKeyConst.USERNAME, "nacos");
setting8.put(PropertyKeyConst.PASSWORD, "nacos");
NamingService inaming8 = NacosFactory.createNamingService(setting7);
inaming8.registerInstance("springboot-provider", "192.168.31.94", 8083);
3.2 優(yōu)雅下線
服務(wù)下線分兩種情況,一個是正常停服,一個是服務(wù)故障。
3.2.1 正常停服
對于正常停服,Nacos 采用心跳檢測來實現(xiàn)服務(wù)在線。心跳周期是 5s,Nacos Server 如果 15s 沒收到心跳就會將實例設(shè)置為不健康,在 30s 沒收到心跳才會講這個服務(wù)刪除。當(dāng)然這個時間可以設(shè)置:
spring.cloud.nacos.discovery.metadata.preserved.heart.beat.interval=1000 #心跳間隔5s->1s
spring.cloud.nacos.discovery.metadata.preserved.heart.beat.timeout=3000 #超時時間15s->3s
spring.cloud.nacos.discovery.metadata.preserved.ip.delete.timeout=5000 #刪除時間30s->5s
但這樣并不能保證服務(wù)停止后能夠立刻從 Nacos Server 下線,很有可能服務(wù)停止后還能再收到請求,最好的方式是手動下線,比如增加一個 API 接口,服務(wù)下線之前增加 preStopHook 函數(shù)調(diào)用這個 API 接口來實現(xiàn)下線。API 接口示例代碼如下:
@GetMapping(value = "/nacos/deregisterInstance")
public String deregisterInstance() {
Properties prop = new Properties();
prop.setProperty("serverAddr", "localhost");
prop.put(PropertyKeyConst.NAMESPACE, "test");
NacosNamingService client = new NacosNamingService(prop);
client.deregisterInstance("springboot-provider", "192.168.31.94", 8083);
return "success";
}
在使用 Ribbon 的場景,也需要考慮 Ribbon 更新本地緩存服務(wù)列表的機(jī)制,手動下線后,可以再等待 30s 再關(guān)閉服務(wù)。
3.2.1 服務(wù)故障
第二種情況是服務(wù)故障,但是并沒有停服,這種情況是很難避免外部請求再發(fā)送過來的。處理方式是對這個服務(wù)本身的健康檢查結(jié)果進(jìn)行處理,比如連續(xù)三次健康檢查失敗,可以調(diào)用上面的 API 接口讓服務(wù)下線。
4 總結(jié)
無論是哪一款注冊中心,優(yōu)雅發(fā)布要解決的問題都是優(yōu)雅上線和優(yōu)雅下線。本文結(jié)合 Nacos 的原理講解了 Nacos 的優(yōu)雅發(fā)布,希望對你有所幫助。