聊聊Powerjob的單機(jī)線程并發(fā)度
序
本文主要研究一下powerjob的單機(jī)線程并發(fā)度(threadConcurrency)
threadConcurrency
powerjob-worker/src/main/java/tech/powerjob/worker/pojo/model/InstanceInfo.java
@Data
public class InstanceInfo implements Serializable {
/**
* 基礎(chǔ)信息
*/
private Long jobId;
private Long instanceId;
private Long wfInstanceId;
/**
* 任務(wù)執(zhí)行處理器信息
*/
// 任務(wù)執(zhí)行類(lèi)型,單機(jī)、廣播、MR
private String executeType;
// 處理器類(lèi)型(JavaBean、Jar、腳本等)
private String processorType;
// 處理器信息
private String processorInfo;
// 定時(shí)類(lèi)型
private int timeExpressionType;
/**
* 超時(shí)時(shí)間
*/
// 整個(gè)任務(wù)的總體超時(shí)時(shí)間
private long instanceTimeoutMS;
/**
* 任務(wù)運(yùn)行參數(shù)
*/
// 任務(wù)級(jí)別的參數(shù),相當(dāng)于類(lèi)的static變量
private String jobParams;
// 實(shí)例級(jí)別的參數(shù),相當(dāng)于類(lèi)的普通變量
private String instanceParams;
// 每臺(tái)機(jī)器的處理線程數(shù)上限
private int threadConcurrency;
// 子任務(wù)重試次數(shù)(任務(wù)本身的重試機(jī)制由server控制)
private int taskRetryNum;
private String logConfig;
}
InstanceInfo定義了threadConcurrency,即每臺(tái)機(jī)器的處理線程數(shù)上限
maxDispatchNum
powerjob-worker/src/main/java/tech/powerjob/worker/core/tracker/task/heavy/HeavyTaskTracker.java
/**
* 定時(shí)掃描數(shù)據(jù)庫(kù)中的task(出于內(nèi)存占用量考慮,每次最多獲取100個(gè)),并將需要執(zhí)行的任務(wù)派發(fā)出去
*/
protected class Dispatcher implements Runnable {
// 數(shù)據(jù)庫(kù)查詢(xún)限制,每次最多查詢(xún)幾個(gè)任務(wù)
private static final int DB_QUERY_LIMIT = 100;
@Override
public void run() {
if (finished.get()) {
return;
}
Stopwatch stopwatch = Stopwatch.createStarted();
// 1. 獲取可以派發(fā)任務(wù)的 ProcessorTracker
List<String> availablePtIps = ptStatusHolder.getAvailableProcessorTrackers();
// 2. 沒(méi)有可用 ProcessorTracker,本次不派發(fā)
if (availablePtIps.isEmpty()) {
log.debug("[TaskTracker-{}] no available ProcessorTracker now.", instanceId);
return;
}
// 3. 避免大查詢(xún),分批派發(fā)任務(wù)
long currentDispatchNum = 0;
long maxDispatchNum = availablePtIps.size() * instanceInfo.getThreadConcurrency() * 2L;
AtomicInteger index = new AtomicInteger(0);
// 4. 循環(huán)查詢(xún)數(shù)據(jù)庫(kù),獲取需要派發(fā)的任務(wù)
while (maxDispatchNum > currentDispatchNum) {
int dbQueryLimit = Math.min(DB_QUERY_LIMIT, (int) maxDispatchNum);
List<TaskDO> needDispatchTasks = taskPersistenceService.getTaskByStatus(instanceId, TaskStatus.WAITING_DISPATCH, dbQueryLimit);
currentDispatchNum += needDispatchTasks.size();
needDispatchTasks.forEach(task -> {
// 獲取 ProcessorTracker 地址,如果 Task 中自帶了 Address,則使用該 Address
String ptAddress = task.getAddress();
if (StringUtils.isEmpty(ptAddress) || RemoteConstant.EMPTY_ADDRESS.equals(ptAddress)) {
ptAddress = availablePtIps.get(index.getAndIncrement() % availablePtIps.size());
}
dispatchTask(task, ptAddress);
});
// 數(shù)量不足 或 查詢(xún)失敗,則終止循環(huán)
if (needDispatchTasks.size() < dbQueryLimit) {
break;
}
}
log.debug("[TaskTracker-{}] dispatched {} tasks,using time {}.", instanceId, currentDispatchNum, stopwatch.stop());
}
}
這里會(huì)計(jì)算maxDispatchNum(availablePtIps.size() * instanceInfo.getThreadConcurrency() * 2L),之后通過(guò)availablePtIps.get(index.getAndIncrement() % availablePtIps.size())去輪詢(xún)派發(fā)任務(wù)
ProcessorTracker
powerjob-worker/src/main/java/tech/powerjob/worker/core/tracker/processor/ProcessorTracker.java
calThreadPoolSize
private int calThreadPoolSize() {
ExecuteType executeType = ExecuteType.valueOf(instanceInfo.getExecuteType());
ProcessorType processorType = ProcessorType.valueOf(instanceInfo.getProcessorType());
// 腳本類(lèi)自帶線程池,不過(guò)為了少一點(diǎn)邏輯判斷,還是象征性分配一個(gè)線程
if (processorType == ProcessorType.PYTHON || processorType == ProcessorType.SHELL) {
return 1;
}
if (executeType == ExecuteType.MAP_REDUCE || executeType == ExecuteType.MAP) {
return instanceInfo.getThreadConcurrency();
}
if (TimeExpressionType.FREQUENT_TYPES.contains(instanceInfo.getTimeExpressionType())) {
return instanceInfo.getThreadConcurrency();
}
return 2;
}
ProcessorTracker的calThreadPoolSize方法會(huì)根據(jù)ProcessorType、ExecuteType、TimeExpressionType來(lái)確定線程池大小,比如ProcessorType.PYTHON或者ProcessorType.SHELL返回1,ExecuteType.MAP_REDUCE、ExecuteType.MAP、TimeExpressionType.FREQUENT_TYPES返回的是instanceInfo.greadConcurrency()
initThreadPool
private static final int THREAD_POOL_QUEUE_MAX_SIZE = 128;
private void initThreadPool() {
int poolSize = calThreadPoolSize();
// 待執(zhí)行隊(duì)列,為了防止對(duì)內(nèi)存造成較大壓力,內(nèi)存隊(duì)列不能太大
BlockingQueue<Runnable> queue = new ArrayBlockingQueue<>(THREAD_POOL_QUEUE_MAX_SIZE);
// 自定義線程池中線程名稱(chēng) (PowerJob Processor Pool -> PPP)
ThreadFactory threadFactory = new ThreadFactoryBuilder().setNameFormat("PPP-%d").build();
// 拒絕策略:直接拋出異常
RejectedExecutionHandler rejectionHandler = new ThreadPoolExecutor.AbortPolicy();
threadPool = new ThreadPoolExecutor(poolSize, poolSize, 60L, TimeUnit.SECONDS, queue, threadFactory, rejectionHandler);
// 當(dāng)沒(méi)有任務(wù)執(zhí)行時(shí),允許銷(xiāo)毀核心線程(即線程池最終存活線程個(gè)數(shù)可能為0)
threadPool.allowCoreThreadTimeOut(true);
}
initThreadPool這里創(chuàng)建了ArrayBlockingQueue,大小為128,RejectedExecutionHandler為AbortPolicy,直接拋出異常RejectedExecutionException
submitTask
public void submitTask(TaskDO newTask) {
// 一旦 ProcessorTracker 出現(xiàn)異常,所有提交到此處的任務(wù)直接返回失敗,防止形成死鎖
// 死鎖分析:TT創(chuàng)建PT,PT創(chuàng)建失敗,無(wú)法定期匯報(bào)心跳,TT長(zhǎng)時(shí)間未收到PT心跳,認(rèn)為PT宕機(jī)(確實(shí)宕機(jī)了),無(wú)法選擇可用的PT再次派發(fā)任務(wù),死鎖形成,GG斯密達(dá) T_T
if (lethal) {
ProcessorReportTaskStatusReq report = new ProcessorReportTaskStatusReq()
.setInstanceId(instanceId)
.setSubInstanceId(newTask.getSubInstanceId())
.setTaskId(newTask.getTaskId())
.setStatus(TaskStatus.WORKER_PROCESS_FAILED.getValue())
.setResult(lethalReason)
.setReportTime(System.currentTimeMillis());
TransportUtils.ptReportTask(report, taskTrackerAddress, workerRuntime);
return;
}
boolean success = false;
// 1. 設(shè)置值并提交執(zhí)行
newTask.setInstanceId(instanceInfo.getInstanceId());
newTask.setAddress(taskTrackerAddress);
HeavyProcessorRunnable heavyProcessorRunnable = new HeavyProcessorRunnable(instanceInfo, taskTrackerAddress, newTask, processorBean, omsLogger, statusReportRetryQueue, workerRuntime);
try {
threadPool.submit(heavyProcessorRunnable);
success = true;
} catch (RejectedExecutionException ignore) {
log.warn("[ProcessorTracker-{}] submit task(taskId={},taskName={}) to ThreadPool failed due to ThreadPool has too much task waiting to process, this task will dispatch to other ProcessorTracker.",
instanceId, newTask.getTaskId(), newTask.getTaskName());
} catch (Exception e) {
log.error("[ProcessorTracker-{}] submit task(taskId={},taskName={}) to ThreadPool failed.", instanceId, newTask.getTaskId(), newTask.getTaskName(), e);
}
// 2. 回復(fù)接收成功
if (success) {
ProcessorReportTaskStatusReq reportReq = new ProcessorReportTaskStatusReq();
reportReq.setInstanceId(instanceId);
reportReq.setSubInstanceId(newTask.getSubInstanceId());
reportReq.setTaskId(newTask.getTaskId());
reportReq.setStatus(TaskStatus.WORKER_RECEIVED.getValue());
reportReq.setReportTime(System.currentTimeMillis());
TransportUtils.ptReportTask(reportReq, taskTrackerAddress, workerRuntime);
log.debug("[ProcessorTracker-{}] submit task(taskId={}, taskName={}) success, current queue size: {}.",
instanceId, newTask.getTaskId(), newTask.getTaskName(), threadPool.getQueue().size());
}
}
submitTask這里會(huì)根據(jù)TaskDO創(chuàng)建HeavyProcessorRunnable,然后提交到threadPool,若有異常則success為false,只有成功了才會(huì)創(chuàng)建ProcessorReportTaskStatusReq,回復(fù)接收任務(wù)成功。若有RejectedExecutionException則會(huì)打印warn日志[ProcessorTracker-{}] submit task(taskId={},taskName={}) to ThreadPool failed due to ThreadPool has too much task waiting to process, this task will dispatch to other ProcessorTracker.
onReceiveProcessorReportTaskStatusReq
powerjob-worker/src/main/java/tech/powerjob/worker/actors/TaskTrackerActor.java
@Handler(path = WTT_HANDLER_REPORT_TASK_STATUS)
public AskResponse onReceiveProcessorReportTaskStatusReq(ProcessorReportTaskStatusReq req) {
int taskStatus = req.getStatus();
// 只有重量級(jí)任務(wù)才會(huì)有兩級(jí)任務(wù)狀態(tài)上報(bào)的機(jī)制
HeavyTaskTracker taskTracker = HeavyTaskTrackerManager.getTaskTracker(req.getInstanceId());
// 手動(dòng)停止 TaskTracker 的情況下會(huì)出現(xiàn)這種情況
if (taskTracker == null) {
log.warn("[TaskTrackerActor] receive ProcessorReportTaskStatusReq({}) but system can't find TaskTracker.", req);
return null;
}
if (ProcessorReportTaskStatusReq.BROADCAST.equals(req.getCmd())) {
taskTracker.broadcast(taskStatus == TaskStatus.WORKER_PROCESS_SUCCESS.getValue(), req.getSubInstanceId(), req.getTaskId(), req.getResult());
}
taskTracker.updateTaskStatus(req.getSubInstanceId(), req.getTaskId(), taskStatus, req.getReportTime(), req.getResult());
// 更新工作流上下文
taskTracker.updateAppendedWfContext(req.getAppendedWfContext());
// 結(jié)束狀態(tài)需要回復(fù)接受成功
if (TaskStatus.FINISHED_STATUS.contains(taskStatus)) {
return AskResponse.succeed(null);
}
return null;
}
TaskTrackerActor接收到ProcessorReportTaskStatusReq,會(huì)通過(guò)updateTaskStatus更新?tīng)顟B(tài),如果是FINISHED_STATUS狀態(tài)則回復(fù)接收成功AskResponse.succeed(null)
TaskStatus
powerjob-worker/src/main/java/tech/powerjob/worker/common/constants/TaskStatus.java
@Getter
@AllArgsConstructor
public enum TaskStatus {
WAITING_DISPATCH(1, "等待調(diào)度器調(diào)度"),
DISPATCH_SUCCESS_WORKER_UNCHECK(2, "調(diào)度成功(但不保證worker收到)"),
WORKER_RECEIVED(3, "worker接收成功,但未開(kāi)始執(zhí)行"),
WORKER_PROCESSING(4, "worker正在執(zhí)行"),
WORKER_PROCESS_FAILED(5, "worker執(zhí)行失敗"),
WORKER_PROCESS_SUCCESS(6, "worker執(zhí)行成功");
public static final Set<Integer> FINISHED_STATUS = Sets.newHashSet(WORKER_PROCESS_FAILED.value, WORKER_PROCESS_SUCCESS.value);
private final int value;
private final String des;
public static TaskStatus of(int v) {
for (TaskStatus taskStatus : values()) {
if (v == taskStatus.value) {
return taskStatus;
}
}
throw new IllegalArgumentException("no TaskStatus match the value of " + v);
}
}
task_info表中的status一共有等待調(diào)度WAITING_DISPATCH、調(diào)度DISPATCH_SUCCESS_WORKER_UNCHECK、worker接收成功WORKER_RECEIVED、worker處理中WORKER_PROCESSING、worker處理失敗WORKER_PROCESS_FAILED、worker處理成功WORKER_PROCESS_SUCCESS這幾個(gè)狀態(tài),其中處理成功和處理失敗為完結(jié)狀態(tài)
HeavyProcessorRunnable
powerjob-worker/src/main/java/tech/powerjob/worker/core/processor/runnable/HeavyProcessorRunnable.java
public void run() {
// 切換線程上下文類(lèi)加載器(否則用的是 Worker 類(lèi)加載器,不存在容器類(lèi),在序列化/反序列化時(shí)會(huì)報(bào) ClassNotFoundException)
Thread.currentThread().setContextClassLoader(processorBean.getClassLoader());
try {
innerRun();
} catch (InterruptedException ignore) {
// ignore
} catch (Throwable e) {
reportStatus(TaskStatus.WORKER_PROCESS_FAILED, e.toString(), null, null);
log.error("[ProcessorRunnable-{}] execute failed, please contact the author(@KFCFans) to fix the bug!", task.getInstanceId(), e);
} finally {
ThreadLocalStore.clear();
}
}
public void innerRun() throws InterruptedException {
final BasicProcessor processor = processorBean.getProcessor();
String taskId = task.getTaskId();
Long instanceId = task.getInstanceId();
log.debug("[ProcessorRunnable-{}] start to run task(taskId={}&taskName={})", instanceId, taskId, task.getTaskName());
ThreadLocalStore.setTask(task);
ThreadLocalStore.setRuntimeMeta(workerRuntime);
// 0. 構(gòu)造任務(wù)上下文
WorkflowContext workflowContext = constructWorkflowContext();
TaskContext taskContext = constructTaskContext();
taskContext.setWorkflowContext(workflowContext);
// 1. 上報(bào)執(zhí)行信息
reportStatus(TaskStatus.WORKER_PROCESSING, null, null, null);
ProcessResult processResult;
ExecuteType executeType = ExecuteType.valueOf(instanceInfo.getExecuteType());
// 2. 根任務(wù) & 廣播執(zhí)行 特殊處理
if (TaskConstant.ROOT_TASK_NAME.equals(task.getTaskName()) && executeType == ExecuteType.BROADCAST) {
// 廣播執(zhí)行:先選本機(jī)執(zhí)行 preProcess,完成后 TaskTracker 再為所有 Worker 生成子 Task
handleBroadcastRootTask(instanceId, taskContext);
return;
}
// 3. 最終任務(wù)特殊處理(一定和 TaskTracker 處于相同的機(jī)器)
if (TaskConstant.LAST_TASK_NAME.equals(task.getTaskName())) {
handleLastTask(taskId, instanceId, taskContext, executeType);
return;
}
// 4. 正式提交運(yùn)行
try {
processResult = processor.process(taskContext);
if (processResult == null) {
processResult = new ProcessResult(false, "ProcessResult can't be null");
}
} catch (Throwable e) {
log.warn("[ProcessorRunnable-{}] task(id={},name={}) process failed.", instanceId, taskContext.getTaskId(), taskContext.getTaskName(), e);
processResult = new ProcessResult(false, e.toString());
}
reportStatus(processResult.isSuccess() ? TaskStatus.WORKER_PROCESS_SUCCESS : TaskStatus.WORKER_PROCESS_FAILED, suit(processResult.getMsg()), null, workflowContext.getAppendedContextData());
}
HeavyProcessorRunnable的run方法委派給了innerRun,它捕獲Throwable異常然后上報(bào)為WORKER_PROCESS_FAILED狀態(tài);innerRun方法在被執(zhí)行時(shí),先上報(bào)狀態(tài)為WORKER_PROCESSING,之后回調(diào)processor.process進(jìn)行處理,若處理成功則上報(bào)WORKER_PROCESS_SUCCESS,否則上報(bào)WORKER_PROCESS_FAILED
小結(jié)
powerjob的InstanceInfo定義了threadConcurrency,即每臺(tái)機(jī)器的處理線程數(shù)上限
- HeavyTaskTracker會(huì)計(jì)算maxDispatchNum(availablePtIps.size() * instanceInfo.getThreadConcurrency() * 2L),之后通過(guò)availablePtIps.get(index.getAndIncrement() % availablePtIps.size())去輪詢(xún)派發(fā)任務(wù)
- ProcessorTracker的calThreadPoolSize方法會(huì)根據(jù)ProcessorType、ExecuteType、TimeExpressionType來(lái)確定線程池大小,比如ProcessorType.PYTHON或者ProcessorType.SHELL返回1,ExecuteType.MAP_REDUCE、ExecuteType.MAP、TimeExpressionType.FREQUENT_TYPES返回的是instanceInfo.greadConcurrency();initThreadPool這里創(chuàng)建了ArrayBlockingQueue,大小為128,RejectedExecutionHandler為AbortPolicy,直接拋出異常RejectedExecutionException;submitTask這里會(huì)根據(jù)TaskDO創(chuàng)建HeavyProcessorRunnable,然后提交到threadPool,若有異常則success為false,只有成功了才會(huì)創(chuàng)建ProcessorReportTaskStatusReq,回復(fù)接收任務(wù)成功
- TaskTrackerActor接收到ProcessorReportTaskStatusReq,會(huì)通過(guò)updateTaskStatus更新?tīng)顟B(tài),如果是FINISHED_STATUS狀態(tài)則回復(fù)接收成功AskResponse.succeed(null)
- HeavyProcessorRunnable的run方法委派給了innerRun,它捕獲Throwable異常然后上報(bào)為WORKER_PROCESS_FAILED狀態(tài);innerRun方法在被執(zhí)行時(shí),先上報(bào)狀態(tài)為WORKER_PROCESSING,之后回調(diào)processor.process進(jìn)行處理,若處理成功則上報(bào)WORKER_PROCESS_SUCCESS,否則上報(bào)WORKER_PROCESS_FAILED