自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開發(fā)者社區(qū)

WOT技術(shù)大會

公眾號矩陣

移動端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫

在線學(xué)習(xí)

文章資源問答課堂專欄直播

51CTO

鴻蒙開發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營

鴻蒙開發(fā)者社區(qū)訂閱號

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開發(fā)者社區(qū)視頻號

51CTO軟考題庫

賬號設(shè)置退出

十個問題理解Linux epoll工作原理

作者：騰訊技術(shù)工程 2021-06-03 18:30:27

開發(fā) 開發(fā)工具

epoll 是 linux 特有的一個 I/O 事件通知機(jī)制。很久以來對 epoll 如何能夠高效處理數(shù)以百萬記的文件描述符很有興趣。近期學(xué)習(xí)、研究了 epoll 源碼，在這個過程中關(guān)于 epoll 數(shù)據(jù)結(jié)構(gòu)和作者的實(shí)現(xiàn)思路產(chǎn)生出不少疑惑，在此總結(jié)為了 10 個問題并逐個加以解答和分析。

epoll 是 linux 特有的一個 I/O 事件通知機(jī)制。很久以來對 epoll 如何能夠高效處理數(shù)以百萬記的文件描述符很有興趣。近期學(xué)習(xí)、研究了 epoll 源碼，在這個過程中關(guān)于 epoll 數(shù)據(jù)結(jié)構(gòu)和作者的實(shí)現(xiàn)思路產(chǎn)生出不少疑惑，在此總結(jié)為了 10 個問題并逐個加以解答和分析。本文基于的內(nèi)核源碼版本是2.6.39 版本。

Question 1：是否所有的文件類型都可以被 epoll 監(jiān)視?

答案：不是?？聪旅孢@個實(shí)驗(yàn)代碼：

#include <stdio.h> 
#include <unistd.h> 
#include <sys/epoll.h> 
#include <stdlib.h> 
#include <sys/types.h> 
#include <sys/stat.h> 
#include <fcntl.h> 
#include <errno.h> 
 
#define MAX_EVENTS 1 
 
int main (void) 
{ 
    int epfd; 
    epfd = epoll_create(100); /* 創(chuàng)建epoll實(shí)例，預(yù)計(jì)監(jiān)聽100個fd */ 
    if (epfd < 0) { 
        perror ("epoll_create"); 
    } 
 
    struct epoll_event *events; 
    int nr_events, i; 
    events = malloc (sizeof (struct epoll_event) * MAX_EVENTS); 
    if (!events) { 
        perror("malloc"); 
        return 1; 
    } 
 
    /* 打開一個普通文本文件 */ 
    int target_fd = open ("./11.txt", O_RDONLY); 
    printf("target_fd %d\n", target_fd); 
    int target_listen_type = EPOLLIN; 
    for (i = 0; i < 1; i++) { 
        int ret; 
        events[i].data.fd = target_fd; /* epoll調(diào)用返回后，返回給應(yīng)用進(jìn)程的fd號 */ 
        events[i].events = target_listen_type; /* 需要監(jiān)聽的事件類型 */ 
        ret = epoll_ctl (epfd, EPOLL_CTL_ADD, target_fd, &events[i]); /* 注冊fd到epoll實(shí)例上 */ 
        if (ret) { 
     printf("ret %d, errno %d\n", ret, errno); 
            perror ("epoll_ctl"); 
        } 
    } 
 
    /* 應(yīng)用進(jìn)程阻塞在epoll上，超時時長置為-1表示一直等到有目標(biāo)事件才會返回 */ 
    nr_events = epoll_wait(epfd, events, MAX_EVENTS, -1); 
    if (nr_events < 0) { 
        perror ("epoll_wait"); 
        free(events); 
        return 1; 
    } 
    for (i = 0; i < nr_events; i++) { 
        /* 打印出處于就緒狀態(tài)的fd及其事件 */ 
        printf("event=%d on fd=%d\n", events[i].events, events[i].data.fd); 
    } 
    free (events); 
    close(epfd); 
    return 0; 
}

編譯、運(yùn)行上面的代碼，會打印出下列信息：

gcc epoll_test.c -o epdemo 
./epdemo 
target_fd 4 
ret -1, errno 1 
epoll_ctl: Operation not permitted

正常打開了"txt"文件 fd=4, 但調(diào)用 epoll_ctl 監(jiān)視這個 fd 時卻 ret=-1 失敗了, 并且錯誤碼為 1，錯誤信息為"Operation not permitted"。錯誤碼指明這個 fd 不能夠被 epoll 監(jiān)視。

那什么樣的 fd 才可以被 epoll 監(jiān)視呢?

只有底層驅(qū)動實(shí)現(xiàn)了 file_operations 中 poll 函數(shù)的文件類型才可以被 epoll 監(jiān)視!socket 類型的文件驅(qū)動是實(shí)現(xiàn)了 poll 函數(shù)的，因此才可以被 epoll 監(jiān)視。struct file_operations 聲明位置是在 include/linux/fs.h 中。

Question 2：ep->wq 的作用是什么?

答案：wq 是一個等待隊(duì)列，用來保存對某一個 epoll 實(shí)例調(diào)用 epoll_wait()的所有進(jìn)程。

一個進(jìn)程調(diào)用 epoll_wait()后，如果當(dāng)前還沒有任何事件發(fā)生，需要讓當(dāng)前進(jìn)程掛起等待(放到 ep->wq 里);當(dāng) epoll 實(shí)例監(jiān)視的文件上有事件發(fā)生后，需要喚醒 ep->wq 上的進(jìn)程去繼續(xù)執(zhí)行用戶態(tài)的業(yè)務(wù)邏輯。之所以要用一個等待隊(duì)列來維護(hù)關(guān)注這個 epoll 的進(jìn)程，是因?yàn)橛袝r候調(diào)用 epoll_wait()的不只一個進(jìn)程，當(dāng)多個進(jìn)程都在關(guān)注同一個 epoll 實(shí)例時，休眠的進(jìn)程們通過這個等待隊(duì)列就可以逐個被喚醒了。

多個進(jìn)程關(guān)注同一個 epoll 實(shí)例，那么有事件發(fā)生后先喚醒誰?后喚醒誰?還是一起全喚醒?這涉及到一個稱為“驚群效應(yīng)”的問題。

Question 3：什么是 epoll 驚群?

答案：多個進(jìn)程等待在 ep->wq 上，事件觸發(fā)后所有進(jìn)程都被喚醒，但只有其中 1 個進(jìn)程能夠成功繼續(xù)執(zhí)行的現(xiàn)象。其他被白白喚起的進(jìn)程等于做了無用功，可能會造成系統(tǒng)負(fù)載過高的問題。下面這段代碼能夠直觀感受什么是 epoll 驚群：

#include <sys/types.h> 
#include <sys/socket.h> 
#include <sys/epoll.h> 
#include <netdb.h> 
#include <string.h> 
#include <stdio.h> 
#include <unistd.h> 
#include <fcntl.h> 
#include <stdlib.h> 
#include <errno.h> 
#include <sys/wait.h> 
#define PROCESS_NUM 10 
static int create_and_bind (char *port) 
{ 
    int fd = socket(PF_INET, SOCK_STREAM, 0); 
    struct sockaddr_in serveraddr; 
    serveraddr.sin_family = AF_INET; 
    serveraddr.sin_addr.s_addr = htonl(INADDR_ANY); 
    serveraddr.sin_port = htons(atoi(port)); 
    bind(fd, (struct sockaddr*)&serveraddr, sizeof(serveraddr)); 
    return fd; 
} 
 
static int make_socket_non_blocking (int sfd) 
{ 
    int flags, s; 
 
    flags = fcntl (sfd, F_GETFL, 0); 
    if (flags == -1) 
    { 
        perror ("fcntl"); 
        return -1; 
    } 
 
    flags |= O_NONBLOCK; 
    s = fcntl (sfd, F_SETFL, flags); 
    if (s == -1) 
    { 
        perror ("fcntl"); 
        return -1; 
    } 
 
    return 0; 
} 
 
#define MAXEVENTS 64 
 
int main (int argc, char *argv[]) 
{ 
    int sfd, s; 
    int efd; 
    struct epoll_event event; 
    struct epoll_event *events; 
 
    sfd = create_and_bind("8001"); 
    if (sfd == -1) 
        abort (); 
 
    s = make_socket_non_blocking (sfd); 
    if (s == -1) 
        abort (); 
 
    s = listen(sfd, SOMAXCONN); 
    if (s == -1) 
    { 
        perror ("listen"); 
        abort (); 
    } 
 
    efd = epoll_create(MAXEVENTS); 
    if (efd == -1) 
    { 
        perror("epoll_create"); 
        abort(); 
    } 
 
    event.data.fd = sfd; 
    //event.events = EPOLLIN | EPOLLET; 
    event.events = EPOLLIN; 
    s = epoll_ctl(efd, EPOLL_CTL_ADD, sfd, &event); 
    if (s == -1) 
    { 
        perror("epoll_ctl"); 
        abort(); 
    } 
 
    /* Buffer where events are returned */ 
    events = calloc(MAXEVENTS, sizeof event); 
    int k; 
    for(k = 0; k < PROCESS_NUM; k++) 
    { 
        int pid = fork(); 
        if(pid == 0) 
        { 
 
            /* The event loop */ 
            while (1) 
            { 
                int n, i; 
                n = epoll_wait(efd, events, MAXEVENTS, -1); 
                printf("process %d return from epoll_wait!\n", getpid()); 
             for (i = 0; i < n; i++) 
                { 
                    if ((events[i].events & EPOLLERR) || (events[i].events & EPOLLHUP) || (!(events[i].events & EPOLLIN))) 
                    { 
                        /* An error has occured on this fd, or the socket is not ready for reading (why were we notified then?) */ 
                        fprintf (stderr, "epoll error\n"); 
                        close (events[i].data.fd); 
                        continue; 
                    } 
                    else if (sfd == events[i].data.fd) 
                    { 
                        /* We have a notification on the listening socket, which means one or more incoming connections. */ 
                        struct sockaddr in_addr; 
                        socklen_t in_len; 
                        int infd; 
                        char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV]; 
 
                        in_len = sizeof in_addr; 
                        infd = accept(sfd, &in_addr, &in_len); 
                        if (infd == -1) 
                        { 
                            printf("process %d accept failed!\n", getpid()); 
                            break; 
                        } 
                        printf("process %d accept successed!\n", getpid()); 
 
                        /* Make the incoming socket non-blocking and add it to the list of fds to monitor. */ 
                        close(infd); 
                    } 
                } 
            } 
        } 
    } 
    int status; 
    wait(&status); 
    free (events); 
    close (sfd); 
    return EXIT_SUCCESS; 
}

將服務(wù)端的監(jiān)聽 socket fd 加入到 epoll_wait 的監(jiān)視集合中，這樣當(dāng)有客戶端想要建立連接，就會事件觸發(fā) epoll_wait 返回。此時如果 10 個進(jìn)程同時在 epoll_wait 同一個 epoll 實(shí)例就出現(xiàn)了驚群效應(yīng)。所有 10 個進(jìn)程都被喚起，但只有一個能成功 accept。

為了解決 epoll 驚群，內(nèi)核后續(xù)的高版本又提供了 EPOLLEXCLUSIVE 選項(xiàng)和 SO_REUSEPORT 選項(xiàng)，我個人理解兩種解決方案思路上的不同點(diǎn)在于：EPOLLEXCLUSIVE 是在喚起進(jìn)程階段起作用，只喚起排在隊(duì)列最前面的 1 個進(jìn)程;而 SO_REUSEPORT 是在分配連接時起作用，相當(dāng)于每個進(jìn)程自己都有一個獨(dú)立的 epoll 實(shí)例，內(nèi)核來決策把連接分配給哪個 epoll。

Question 4：ep->poll_wait 的作用是什么?

答案：ep->poll_wait 是 epoll 實(shí)例中另一個等待隊(duì)列。當(dāng)被監(jiān)視的文件是一個 epoll 類型時，需要用這個等待隊(duì)列來處理遞歸喚醒。

在閱讀內(nèi)核代碼過程中，ep->wq 還算挺好理解，但我發(fā)現(xiàn)伴隨著 ep->wq 喚醒，還有一個 ep->poll_wait 的喚醒過程。比如下面這段代碼，在 eventpoll.c 中出現(xiàn)了很多次：

/* If the file is already "ready" we drop it inside the ready list */ 
    if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) { 
        list_add_tail(&epi->rdllink, &ep->rdllist); 
 
        /* Notify waiting tasks that events are available */ 
        if (waitqueue_active(&ep->wq)) 
            wake_up_locked(&ep->wq); 
        if (waitqueue_active(&ep->poll_wait)) 
            pwake++; 
    } 
 
    spin_unlock_irqrestore(&ep->lock, flags); 
 
    atomic_long_inc(&ep->user->epoll_watches); 
 
    /* We have to call this outside the lock */ 
    if (pwake) 
        ep_poll_safewake(&ep->poll_wait);

查閱很多資料后才搞明白其實(shí) epoll 也是一種文件類型，其底層驅(qū)動也實(shí)現(xiàn)了 file_operations 中的 poll 函數(shù)，因此一個 epoll 類型的 fd 可以被其他 epoll 實(shí)例監(jiān)視。而 epoll 類型的 fd 只會有“讀就緒”的事件。當(dāng) epoll 所監(jiān)視的非 epoll 類型文件有“讀就緒”事件時，當(dāng)前 epoll 也會進(jìn)入“讀就緒”狀態(tài)。

因此如果一個 epoll 實(shí)例監(jiān)視了另一個 epoll 就會出現(xiàn)遞歸。舉個例子，如圖所示：

epollfd1 監(jiān)視了 2 個“非 epoll”類型的 fd

epollfd2 監(jiān)視了 epollfd1 和 2 個“非 epoll”類型的 fd

如果 epollfd1 所監(jiān)視的 2 個 fd 中有可讀事件觸發(fā)，fd 的 ep_poll_callback 回調(diào)函數(shù)會觸發(fā)將 fd 放到 epollfd1 的 rdllist 中。此時 epollfd1 本身的可讀事件也會觸發(fā)，就需要從 epollfd1 的 poll_wait 等待隊(duì)列中找到 epollfd2，調(diào)用 epollfd1 的 ep_poll_callback(將 epollfd1 放到 epollfd2 的 rdllist 中)。因此 ep->poll_wait 是用來處理 epoll 間嵌套監(jiān)視的情況的。

Question 5：ep->rdllist 的作用是什么?

答案：epoll 實(shí)例中包含就緒事件的 fd 組成的鏈表。

通過掃描 ep->rdllist 鏈表，內(nèi)核可以輕松獲取當(dāng)前有事件觸發(fā)的 fd。而不是像 select()/poll() 那樣全量掃描所有被監(jiān)視的 fd，再從中找出有事件就緒的。因此可以說這一點(diǎn)決定了 epoll 的性能是遠(yuǎn)高于 select/poll 的。

看到這里你可能又產(chǎn)生了一個小小的疑問：為什么 epoll 中事件就緒的 fd 會“主動”跑到 rdllist 中去，而不用全量掃描就能找到它們呢? 這是因?yàn)槊慨?dāng)調(diào)用 epoll_ctl 新增一個被監(jiān)視的 fd 時，都會注冊一下這個 fd 的回調(diào)函數(shù) ep_poll_callback，當(dāng)網(wǎng)卡收到數(shù)據(jù)包會觸發(fā)一個中斷，中斷處理函數(shù)再回調(diào) ep_poll_callback 將這個 fd 所屬的“epitem”添加至 epoll 實(shí)例中的 rdllist 中。

Question 6：ep->ovflist 的作用是什么?

答案：在 rdllist 被占用時，用來在不持有 ep->lock 的情況下收集有就緒事件的 fd。

當(dāng) epoll 上已經(jīng)有了一些就緒事件的時候，內(nèi)核需要掃描 rdllist 將就緒的 fd 返回給用戶態(tài)。這一步通過 ep_scan_ready_list 函數(shù)來實(shí)現(xiàn)。其中 sproc 是一個回調(diào)函數(shù)(也就是 ep_send_events_proc 函數(shù))，來處理數(shù)據(jù)從內(nèi)核態(tài)到用戶態(tài)的復(fù)制。

/** 
 * ep_scan_ready_list - Scans the ready list in a way that makes possible for the scan code, to call f_op->poll(). Also allows for O(NumReady) performance. 
 * @ep: Pointer to the epoll private data structure. 
 * @sproc: Pointer to the scan callback. 
 * @priv: Private opaque data passed to the @sproc callback. 
 * Returns: The same integer error code returned by the @sproc callback. 
 */ 
static int ep_scan_ready_list(struct eventpoll *ep, 
                  int (*sproc)(struct eventpoll *, 
                       struct list_head *, void *), 
                  void *priv)

由于 rdllist 鏈表業(yè)務(wù)非常繁忙(epoll 增加監(jiān)視文件、修改監(jiān)視文件、有事件觸發(fā)...等情況都需要操作 rdllist)，所以在復(fù)制數(shù)據(jù)到用戶空間時，加了一個 ep->mtx 互斥鎖來保護(hù) epoll 自身數(shù)據(jù)結(jié)構(gòu)線程安全，此時其他執(zhí)行流程里有爭搶 ep->mtx 的操作都會因命中 ep->mtx 進(jìn)入休眠。

但加鎖期間很可能有新事件源源不斷地產(chǎn)生，進(jìn)而調(diào)用 ep_poll_callback(ep_poll_callback 不用爭搶 ep->mtx 所以不會休眠)，新觸發(fā)的事件需要一個地方來收集，不然就丟事件了。這個用來臨時收集新事件的鏈表就是 ovflist。我的理解是：引入 ovflist 后新產(chǎn)生的事件就不用因?yàn)橄胂?rdllist 里寫而去和 ep_send_events_proc 爭搶自旋鎖(ep->lock), 同時 ep_send_events_proc 也可以放心大膽地在無鎖(不持有 ep->lock)的情況下修改 rdllist。

看代碼時會發(fā)現(xiàn)，還有一個 txlist 鏈表，這個鏈表用來最后向用戶態(tài)復(fù)制數(shù)據(jù)，rdllist 要先把自己的數(shù)據(jù)全部轉(zhuǎn)移到 txlist，然后 rdllist 自己被清空。ep_send_events_proc 遍歷 txlist 處理向用戶空間復(fù)制，復(fù)制成功后如果是水平觸發(fā)(LT)還要把這個事件還回 rdllist，等待下一次 epoll_wait 來獲取它。

ovflist 上的 fd 會合入 rdllist 上等待下一次掃描;如果 txlist 上的 fd 沒有處理完，最后也會合入 rdllist。這 3 個鏈表的關(guān)系是這樣：

Question 7：epitem->pwqlist 隊(duì)列的作用是什么?

答案：用來保存這個 epitem 的 poll 等待隊(duì)列。

首先介紹下什么是 epitem。epitem 是 epoll 中很重要的一種數(shù)據(jù)結(jié)構(gòu)，是紅黑樹和 rdllist 的基本組成元素。需要監(jiān)聽的文件和事件信息，都被包裝在 epitem 結(jié)構(gòu)里。

struct epitem { 
    struct rb_node rbn;  // 用于加入紅黑樹 
    struct list_head rdllink; // 用于加入rdllist 
    struct epoll_filefd ffd; // 包含被監(jiān)視文件的文件指針和fd信息 
    struct list_head pwqlist; // poll等待隊(duì)列 
    struct eventpoll *ep; // 所屬的epoll實(shí)例 
    struct epoll_event event;  // 關(guān)注的事件 
    /* 其他成員省略 */ 
};

回憶一下上文說到，每當(dāng)用戶調(diào)用 epoll_ctl()新增一個監(jiān)視文件，都要給這個文件注冊一個回調(diào)函數(shù) ep_poll_callback, 當(dāng)網(wǎng)卡收到數(shù)據(jù)后軟中斷會調(diào)用這個 ep_poll_callback 把這個 epitem 加入到 ep->rdllist 中。

pwdlist 就是跟 ep_poll_callback 注冊相關(guān)的。

當(dāng)調(diào)用 epoll_ctl()新增一個監(jiān)視文件后，內(nèi)核會為這個 epitem 創(chuàng)建一個 eppoll_entry 對象，通過 eppoll_entry->wait_queue_t->wait_queue_func_t 來設(shè)置 ep_poll_callback。pwdlist 為什么要做成一個隊(duì)列呢，直接設(shè)置成 eppoll_entry 對象不就行了嗎?實(shí)際上不同文件類型實(shí)現(xiàn) file_operations->poll 用到等待隊(duì)列數(shù)量可能不同。雖然大多數(shù)都是 1 個，但也有例外。比如“scullpipe”類型的文件就用到了 2 個等待隊(duì)列。

pwqlist、epitem、fd、epoll_entry、ep_poll_callback 間的關(guān)系是這樣：

Question 8：epmutex、ep->mtx、ep->lock 3 把鎖的區(qū)別是?

答案：鎖的粒度和使用目的不同。

epmutex 是一個全局互斥鎖，epoll 中一共只有 3 個地方用到這把鎖。分別是 ep_free() 銷毀一個 epoll 實(shí)例時、eventpoll_release_file() 清理從 epoll 中已經(jīng)關(guān)閉的文件時、epoll_ctl() 時避免 epoll 間嵌套調(diào)用時形成死鎖。我的理解是 epmutex 的鎖粒度最大，用來處理跨 epoll 實(shí)例級別的同步操作。
ep->mtx 是一個 epoll 內(nèi)部的互斥鎖，在 ep_scan_ready_list() 掃描就緒列表、eventpoll_release_file() 中執(zhí)行 ep_remove()刪除一個被監(jiān)視文件、ep_loop_check_proc()檢查 epoll 是否有循環(huán)嵌套或過深嵌套、還有 epoll_ctl() 操作被監(jiān)視文件增刪改等處有使用?？梢钥闯錾鲜龅暮瘮?shù)里都會涉及對 epoll 實(shí)例中 rdllist 或紅黑樹的訪問，因此我的理解是 ep->mtx 是一個 epoll 實(shí)例內(nèi)的互斥鎖，用來保護(hù) epoll 實(shí)例內(nèi)部的數(shù)據(jù)結(jié)構(gòu)的線程安全。
ep->lock 是一個 epoll 實(shí)例內(nèi)部的自旋鎖，用來保護(hù) ep->rdllist 的線程安全。自旋鎖的特點(diǎn)是得不到鎖時不會引起進(jìn)程休眠，所以在 ep_poll_callback 中只能使用 ep->lock，否則就會丟事件。

Question 9：epoll 使用紅黑樹的目的是什么?

答案：用來維護(hù)一個 epoll 實(shí)例中所有的 epitem。

用戶態(tài)調(diào)用 epoll_ctl()來操作 epoll 的監(jiān)視文件時，需要增、刪、改、查等動作有著比較高的效率。尤其是當(dāng) epoll 監(jiān)視的文件數(shù)量達(dá)到百萬級的時候，選用不同的數(shù)據(jù)結(jié)構(gòu)帶來的效率差異可能非常大。

從時間(增、刪、改、查、按序遍歷)、空間(存儲空間大小、擴(kuò)展性)等方面考量，紅黑樹都是非常優(yōu)秀的數(shù)據(jù)結(jié)構(gòu)(當(dāng)然這以紅黑樹比較高的實(shí)現(xiàn)復(fù)雜度作為代價)。epoll 紅黑樹中的 epitem 是按什么順序組織的。閱讀代碼可以發(fā)現(xiàn)是先比較 2 個文件指針的地址大小，如果相同再比較文件 fd 的大小。

/* Compare RB tree keys */ 
static inline int ep_cmp_ffd(struct epoll_filefd *p1, struct epoll_filefd *p2) 
{ 
    return (p1->file > p2->file ? +1 : (p1->file < p2->file ? -1 : p1->fd - p2->fd)); 
}

epoll、epitem、和紅黑樹間的組織關(guān)系是這樣：

Question 10：什么是水平觸發(fā)、邊緣觸發(fā)?

答案：水平觸發(fā)(LT)和邊緣觸發(fā)(ET)是 epoll_wait 的 2 種工作模式。水平觸發(fā)：關(guān)注點(diǎn)是數(shù)據(jù)(讀操作緩沖區(qū)不為空，寫操作緩沖區(qū)不為滿)，epoll_wait 總會返回就緒。LT 是 epoll 的默認(rèn)工作模式。

邊緣觸發(fā)：關(guān)注點(diǎn)是變化，只有監(jiān)視的文件上有數(shù)據(jù)變化發(fā)生(讀操作關(guān)注有數(shù)據(jù)寫進(jìn)緩沖區(qū)，寫操作關(guān)注數(shù)據(jù)從緩沖區(qū)取走)，epoll_wait 才會返回。

看一個實(shí)驗(yàn) ,直觀感受下 2 種模式的區(qū)別, 客戶端都是輸入“abcdefgh” 8 個字符，服務(wù)端每次接收 2 個字符。

水平觸發(fā)時，客戶端輸入 8 個字符觸發(fā)了一次讀就緒事件，由于被監(jiān)視文件上還有數(shù)據(jù)可讀故一直返回讀就緒，服務(wù)端 4 次循環(huán)每次都能取到 2 個字符，直到 8 個字符全部讀完。

邊緣觸發(fā)時，客戶端同樣輸入 8 個字符但服務(wù)端一次循環(huán)讀到 2 個字符后這個讀就緒事件就沒有了。等客戶端再輸入一個字符串后，服務(wù)端關(guān)注到了數(shù)據(jù)的“變化”繼續(xù)從緩沖區(qū)讀接下來的 2 個字符“c”和”d”。

小結(jié)

本文通過 10 個問題，其實(shí)也是從 10 個不同的視角去觀察 epoll 這間宏偉的殿堂。至此也基本介紹完了 epoll 從監(jiān)視事件，到內(nèi)部數(shù)據(jù)結(jié)構(gòu)組織、事件處理，最后到 epoll_wait 返回的整體工作過程。最后附上一張 epoll 相關(guān)數(shù)據(jù)結(jié)構(gòu)間的關(guān)系圖，在學(xué)習(xí) epoll 過程中它曾解答了我心中不少的疑惑，我愿稱之為燈塔。

參考資料

Implementation of Epoll

Red-black Trees (rbtree) in Linux

What is the purpose of epoll's edge triggered option?

epoll 源碼分析(基于 linux-5.1.4)

epoll 實(shí)現(xiàn)原理

epoll (2) source code analysis

epoll 的內(nèi)核實(shí)現(xiàn)

Linux Kernel Notes: epoll Implementation Principle

accept 與 epoll 驚群

責(zé)任編輯：武曉燕來源： 51CTO專欄

Linux epoll I O 事件

點(diǎn)贊

51CTO技術(shù)棧公眾號

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營

<big id="he9mr"><code id="he9mr"><rp id="he9mr"></rp></code></big>

<sub id="he9mr"><p id="he9mr"></p></sub>