How to deal with irregularly sampled time series data

Utpal Kumar 5 minute read TECHNIQUES December 27, 2021

While analyzing time series data, we often come across data that is non-uniformly sampled, i.e., they have non-equidistant time-steps. Infact, most of the recordings in nature are non-uniformly sampled. It is well known that the analysis of irregularly spaced data sets is more complicated than that of regularly spaced ones. How can we deal with such datasets? Can we always interpolate such time series to make it uniformly sampled?

While analyzing time series data, we often come across data that is non-uniformly sampled, i.e., they have non-equidistant time-steps. Infact, most of the recordings in nature are non-uniformly sampled. It is well known that the analysis of irregularly spaced data sets is more complicated than that of regularly spaced ones. How can we deal with such datasets? Can we always interpolate such time series to make it uniformly sampled?

In this post, we will see some effective methods to deal with irregularly sampled datasets.

Can we always interpolate irregularly sampled data?

When we come across irregularly sampled data, the first thing we tend to do is to interpolate or resample them. This makes our life much easier as we can directly apply standard methods of analyses. However, when we interpolate the time series data, we assume (sometimes unknowingly) that data samples behave monotonically at each interval. This assumption may not hold if we over interpolate the data, interpolate the data with large variations in data intervals or timesteps, and so on. If this assumption doesn’t hold then it may lead to many artifacts or misleading results.

Distribution of timesteps

One of the most important factor to consider while dealing with time series data is the distribution of their timesteps. Structure present in the data is often more clearly displayed by a continuous curve than by the scattered, clumped original data points. The best way to visualize the distribution is to plot the histogram of time intervals of the datasets.

I have an irregularly sampled real-world time series data in file in xlsx format.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

plt.style.use('seaborn')

plt.rc('font', size=20) #controls default text size
plt.rc('axes', titlesize=20) #fontsize of the title
plt.rc('axes', labelsize=20) #fontsize of the x and y labels
plt.rc('xtick', labelsize=20) #fontsize of the x tick labels
plt.rc('ytick', labelsize=20) #fontsize of the y tick labels
plt.rc('legend', fontsize=20) #fontsize of the legend

# pip install openpyxl xlrd

df = pd.read_excel(xlsfilename, sheet_name=sheetname) 

timearray = pd.to_datetime(df['time'], format="%Y_%m_%d_%H_%M_%S_%f")
df['time'] = timearray
df['time'] = df['time'].sort_values(ascending=True)
df['time_sec'] = df['time'].dt.minute * 60 + df['time'].dt.second + df['time'].dt.microsecond / 10e5

df.plot(x="time",y="X1", figsize=(20,6))

dt = pd.Series(df['time_sec'].diff(), name='dt in seconds')
dt.value_counts()

10    2264
11    1835
11     705
10     548
12     338
13     130
12     108
14      46
13      42
15      40
14      20
16      11
17       9
16       7
09       5
09       5
15       4
18       2
19       2
10       1
20       1
18       1
Name: dt in seconds, dtype: int64

sns.set(rc={'figure.figsize':(12,6)})
ax = sns.distplot(dt, kde=False)
ax.set(yscale="log")
ax.set(ylabel="Number of observations")
plt.savefig('df_distplot.png', dpi=300, bbox_inches='tight')

As you can notice that the timestep has large variance in our dataset. Please note that large or small variance of the data is subject to the problem in consideration. If the variance is small enough to neglect then, we can simply interpolate the datasets.

However, if there is a large variance of the timesteps in the time series data, then we need to carefully proceed. If the distribution of timesteps is bimodal with one large timestep and another small timestep. In that case, it is best to go with large timestep as the interpolation timestep to avoid oversampling of the data.

Lomb–Scargle periodogram

If we want the do the spectral analysis of the non-uniform time series, then the Lomb-Scargle periodogram is the way to go. It was developed by Lomb [Lomb, N.R., 1976] and further extended by Scargle [Scargle, J.D., 1982] to find, and test the significance of weak periodic signals with uneven temporal sampling. Lomb–Scargle periodogram is a method that allows efﬁcient computation of a Fourier-like power spectrum estimator from unevenly sampled data, resulting in an intuitive means of determining the period of oscillation.

Periodogram of irregularly sampled data

Let us compute the fft based periodogram and Lomb-Scargle periodogram of our irregularly sampled data from the previous section.

fig, ax_list = plt.subplots(2,2, figsize=(20, 15), dpi=100)

# fig 1
ax=ax_list[0,0]
data = df['X1'].values
t_steps = df['time_sec'].values
ax.plot(t_steps, data, color='blue', alpha=0.7)
ax.set_ylabel("Signal")
ax.set_xlabel("Time")
ax.set_xlim((t_steps[0], t_steps[-1]))


# fig 2
ax=ax_list[0,1]
data = df['X1'].values
t_steps0 = df['time_sec'].values
t_steps = np.arange(t_steps0[0], t_steps0[0]+len(data))
ax.plot(t_steps, data, color='blue', alpha=0.7)
ax.set_ylabel("Signal")
ax.set_xlabel("Time")
ax.set_xlim((t_steps[0], t_steps[-1]))


# fig 3
ax=ax_list[1,0]
input_signal = df['X1'].values
t_steps = df['time_sec'].values

f, Pxx_den = signal.periodogram(input_signal, fs=10)
ax.plot(f, Pxx_den)
ax.set_ylabel('Power spectral density')
ax.set_xlabel('Frequency (Hz)')

# fig 4
# Plot Lomb-Scargle spectrogram of input signal
ax = ax_list[1,1]
data = df['X1'].values
t_steps = df['time_sec'].values
f = np.linspace(0.01, 10, 1000)
pgram = signal.lombscargle(x, y, f, normalize=True)
ax.plot(f, pgram)
ax.set_ylabel("Lomb-Scargle normalized periodogram")
ax.set_xlabel('Frequency (Hz)')
plt.savefig('periodogram_plot.png', dpi=300, bbox_inches='tight')

For plotting a Lomb–Scargle periodogram

Periodogram after resampling the data

Let us first resample time series to even intervals of 0.1s.

dfnew = df.resample("100l").last().interpolate(method="nearest")

fig, ax_list = plt.subplots(2,2, figsize=(20, 15), dpi=100)

# fig 1
ax=ax_list[0,0]
data = dfnew['X1'].values
t_steps = dfnew['time_sec'].values
ax.plot(t_steps, data, color='blue', alpha=0.7)
ax.set_ylabel("Signal")
ax.set_xlabel("Time")
ax.set_xlim((t_steps[0], t_steps[-1]))


# fig 2
ax=ax_list[0,1]
data = dfnew['X1'].values
t_steps0 = dfnew['time_sec'].values
t_steps = np.arange(t_steps0[0], t_steps0[0]+len(data))
ax.plot(t_steps, data, color='blue', alpha=0.7)
ax.set_ylabel("Signal")
ax.set_xlabel("Time")
ax.set_xlim((t_steps[0], t_steps[-1]))


# fig 3
ax=ax_list[1,0]
input_signal = dfnew['X1'].values
t_steps = dfnew['time_sec'].values

f, Pxx_den = signal.periodogram(input_signal, fs=10)
ax.plot(f, Pxx_den)
ax.set_ylabel('Power spectral density')
ax.set_xlabel('Frequency (Hz)')



# fig 4
# Plot Lomb-Scargle spectrogram of input signal
ax = ax_list[1,1]
data = dfnew['X1'].values
t_steps = dfnew['time_sec'].values
f = np.linspace(0.01, 10, 1000)
pgram = signal.lombscargle(x, y, f, normalize=True)
ax.plot(f, pgram)
ax.set_ylabel("Lomb-Scargle normalized periodogram")
ax.set_xlabel('Frequency (Hz)')
plt.savefig('periodogram_plot_evendata.png', dpi=300, bbox_inches='tight')

Conclusions

In this post, we saw a few aspects of analyzing time series data with uneven sampling. For more details, I refer readers to the references below, such as Understanding the Lomb–Scargle Periodogram, and others.

How to deal with irregularly sampled time series data

Can we always interpolate irregularly sampled data?

Distribution of timesteps

Lomb–Scargle periodogram

Periodogram of irregularly sampled data

Periodogram after resampling the data

Conclusions

Further readings

Understanding the metropolis-hastings algorithm: a step-by-step guide with python examples

Exploring the multi-taper method: a robust approach for time-frequency analysis

What is pink noise and how is it different from white noise?

Object oriented programming (oop) with examples in python

Disclaimer of liability

Leave a comment

Subscribe to our weekly newsletter

Can we always interpolate irregularly sampled data?

Distribution of timesteps

Lomb–Scargle periodogram

Periodogram of irregularly sampled data

Periodogram after resampling the data

Conclusions

Further readings

Disclaimer of liability

Leave a comment