自拍偷在线精品自拍偷,亚洲欧美中文日韩v在线观看不卡

<blockquote id="p6hkl"></blockquote>

51CTO首頁(yè)

AI.x社區(qū)

軟考社區(qū)

免費(fèi)課

企業(yè)培訓(xùn)

鴻蒙開(kāi)發(fā)者社區(qū)

WOT技術(shù)大會(huì)

公眾號(hào)矩陣

移動(dòng)端

視頻課免費(fèi)課排行榜短視頻直播課軟考學(xué)堂

全部課程軟考華為認(rèn)證廠(chǎng)商認(rèn)證 IT技術(shù)PMP項(xiàng)目管理免費(fèi)題庫(kù)

在線(xiàn)學(xué)習(xí)

文章資源問(wèn)答課堂專(zhuān)欄直播

51CTO

鴻蒙開(kāi)發(fā)者社區(qū)

51CTO技術(shù)棧

51CTO官微

51CTO學(xué)堂

51CTO博客

CTO訓(xùn)練營(yíng)

鴻蒙開(kāi)發(fā)者社區(qū)訂閱號(hào)

51CTO軟考

51CTO學(xué)堂APP

51CTO學(xué)堂企業(yè)版APP

鴻蒙開(kāi)發(fā)者社區(qū)視頻號(hào)

51CTO軟考題庫(kù)

賬號(hào)設(shè)置退出

計(jì)算時(shí)間序列周期的三種方法

作者：Shashindra Silva 2023-02-21 14:58:12

開(kāi)發(fā) 前端

周期是數(shù)據(jù)中出現(xiàn)重復(fù)模式所需的時(shí)間長(zhǎng)度。更具體地說(shuō)，它是模式的一個(gè)完整周期的持續(xù)時(shí)間。在這篇文章中，將介紹計(jì)算時(shí)間序列周期的三種不同方法。

周期是數(shù)據(jù)中出現(xiàn)重復(fù)模式所需的時(shí)間長(zhǎng)度。更具體地說(shuō)，它是模式的一個(gè)完整周期的持續(xù)時(shí)間。在這篇文章中，將介紹計(jì)算時(shí)間序列周期的三種不同方法。

我們使用City of Ottawa 數(shù)據(jù)集，主要關(guān)注的是每天的服務(wù)呼叫數(shù)量。所以不需要對(duì)病房名稱(chēng)進(jìn)行初始數(shù)據(jù)處理。Ottawa 數(shù)據(jù)集在渥太華市提供的數(shù)據(jù)門(mén)戶(hù)網(wǎng)站上免費(fèi)提供。

讓我們加載2019-2022年的這些數(shù)據(jù)，并將它們連接起來(lái)得到一個(gè)df。

from google.colab import drive
 drive.mount('/content/gdrive')
 import pandas as pd
 import matplotlib.pyplot as plt
 import seaborn as sns
 import numpy as np
 
 file_path = '/content/gdrive/My Drive/Colab Notebooks/Data/SR-2019.xlsx'
 records2019 = pd.read_excel(file_path)#,encoding='utf16')
 
 file_path = '/content/gdrive/My Drive/Colab Notebooks/Data/SR-2020.xlsx'
 records2020 = pd.read_excel(file_path)#,encoding='utf16')
 
 file_path = '/content/gdrive/My Drive/Colab Notebooks/Data/2021_Monthly_Service_Requests_EN.xlsx'
 records2021 = pd.read_excel(file_path)#,encoding='utf16')
 
 file_path = '/content/gdrive/My Drive/Colab Notebooks/Data/2022_Monthly_Service_Requests.csv'
 records2022 =pd.read_csv(file_path)
 
 records=pd.concat([records2019,records2020,records2021,records2022],axis=0)

讓我們根據(jù)服務(wù)調(diào)用日期聚合這些數(shù)據(jù)，并得到一個(gè)簡(jiǎn)單的圖。

records["DATE_RAISED"]=pd.to_datetime(records.DATE_RAISED)
 record_by_date=records.groupby("DATE_RAISED")["TYPE"].count().sort_index()
 record_by_date.plot(figsize = (25, 10))
 plt.ylabel('Number of requests')
 plt.grid(visible=True,which='both')
 plt.figure()
 
 record_by_date.iloc[100:130].plot(figsize = (25, 10))
 plt.ylabel('Number of requests')
 plt.grid(visible=True,which='both')

填充缺失

讓我們檢查一下我們的數(shù)據(jù)是否包含了所有的日期。

start_date = record_by_date.index.min()
 end_date = record_by_date.index.max()
 
 # create a complete date range for the period of interest
 date_range = pd.date_range(start=start_date, end=end_date, freq='D')
 
 # compare the date range to the index of the time series
 missing_dates = date_range[~date_range.isin(record_by_date.index)]
 
 if len(missing_dates) > 0:
     print("Missing dates:", missing_dates)
 else:
     print("No missing dates")

正如所預(yù)期的那樣，數(shù)據(jù)缺少一些日期的值。讓我們用相鄰日期的平均值填充這些值。

# Reindex to fill missing dates
 idx = pd.date_range(start=record_by_date.index.min(), end=record_by_date.index.max(), freq='D')
 record_by_date = record_by_date.reindex(idx, fill_value=0)
 
 # Add missing dates with average of surrounding values
 for date in missing_dates:
     prev_date = date - pd.DateOffset(days=1)
     next_date = date + pd.DateOffset(days=1)
     prev_val = record_by_date.loc[prev_date] if prev_date in record_by_date.index else np.nan
     next_val = record_by_date.loc[next_date] if next_date in record_by_date.index else np.nan
     avg_val = np.nanmean([prev_val, next_val])
     record_by_date.loc[date] = avg_val

這就是我們要做的所有預(yù)處理了，在所有這些步驟之后，我們嘗試檢測(cè)這個(gè)時(shí)間序列的周期。一般來(lái)說(shuō)，基于假日模式和一般的人類(lèi)習(xí)慣，我們希望在數(shù)據(jù)中看到七天的周期，我們來(lái)看看是不是有這樣的結(jié)果。

1、目測(cè)

最簡(jiǎn)單的方法就是目測(cè)。這是一種主觀(guān)的方法，而不是一種正式的或統(tǒng)計(jì)的方法，所以我把它作為我們列表中的原始方法。

如果我們看一下這張圖的放大部分，我們可以看到7天的周期。最低值出現(xiàn)在5月14日、21日和28日。但最高點(diǎn)似乎不遵循這個(gè)模式。但在更大的范圍內(nèi)，我們?nèi)匀豢梢哉f(shuō)這個(gè)數(shù)據(jù)集的周期是7天。

下面我們來(lái)正式的進(jìn)行分析：

2、自相關(guān)分析

我們將繪制時(shí)間序列的自相關(guān)值。查看acf圖中各種滯后值的峰值。與第一個(gè)顯著峰值對(duì)應(yīng)的滯后可以給出周期的估計(jì)。

對(duì)于這種情況，我們看看50個(gè)滯后值，并使用statmodels包中的方法繪制acf。

from statsmodels.graphics.tsaplots import plot_acf
 
 fig, ax = plt.subplots(figsize=(14,7))
 plot_acf(record_by_date.values.squeeze(), lags=50,ax=ax,title='Autocorrelation', use_vlines=True);
 lags = list(range(51))
 ax.set_xticks(lags);
 ax.set_xticklabels(lags);

從上圖可以看出，在7、1、21等處有峰值。這證實(shí)了我們的時(shí)間序列有7天的周期。

3、快速傅里葉變換

對(duì)時(shí)間序列進(jìn)行傅里葉變換，尋找主頻分量。主頻率的倒數(shù)可以作為周期的估計(jì)值。

傅里葉變換是一種數(shù)學(xué)運(yùn)算，它把一個(gè)復(fù)雜的信號(hào)分解成一組更簡(jiǎn)單的正弦和余弦波。傅里葉變換廣泛應(yīng)用于信號(hào)處理、通信、圖像處理以及其他許多科學(xué)和工程領(lǐng)域。它允許我們?cè)陬l域中分析和操作信號(hào)，這通常是一種比在時(shí)域中更自然和直觀(guān)的理解和處理信號(hào)的方法。

from scipy.fft import fft
 
 # Calculate the Fourier transform
 yf = np.fft.fft(record_by_date)
 xf = np.linspace(0.0, 1.0/(2.0), len(record_by_date)//2)
 
 # Find the dominant frequency
 # We have to drop the first element of the fft as it corresponds to the
 # DC component or the average value of the signal
 idx = np.argmax(np.abs(yf[1:len(record_by_date)//2]))
 freq = xf[idx]
 
 period =(1/freq)
 print(f"The period of the time series is {period}")

輸出為：The period of the time series is 7.030927835051545。這與我們使用acf和目視檢查發(fā)現(xiàn)的每周周期相似。

4、周期圖

周期圖 Periodogram 是一個(gè)信號(hào)或序列的功率譜密度(PSD)圖。換句話(huà)說(shuō)它是一個(gè)顯示信號(hào)中每個(gè)頻率包含多少總功率的圖表。周期圖是通過(guò)計(jì)算信號(hào)的傅里葉變換的幅值平方得到的，常用于信號(hào)處理和頻譜分析。在某種意義上，只是前面給出的基于fft的方法的擴(kuò)展。

from scipy.signal import periodogram
 
 freq, power = periodogram(record_by_date)
 period = 1/freq[np.argmax(power)]
 print(f"The period of the time series is {period}")
 
 plt.plot(freq, power)
 plt.xlabel('Frequency (Hz)')
 plt.ylabel('Power spectral density')
 plt.show()

周期圖可以清楚地看出，信號(hào)的最高功率在0.14，對(duì)應(yīng)于7天的周期。

總結(jié)

本文，我們介紹了尋找時(shí)間序列周期的三種不同方法，通過(guò)使用這三種方法，我們能夠識(shí)別信號(hào)的周期性，并使用常識(shí)進(jìn)行確認(rèn)。

責(zé)任編輯：華軒來(lái)源： DeepHub IMBA

間序列周期數(shù)據(jù)集

點(diǎn)贊

51CTO技術(shù)棧公眾號(hào)

業(yè)務(wù)
速覽

媒體

51CTO CIOAge HC3i

社區(qū)

51CTO博客鴻蒙開(kāi)發(fā)者社區(qū) AI.x社區(qū)

教育

51CTO學(xué)堂精培企業(yè)培訓(xùn) CTO訓(xùn)練營(yíng)