# Time Series Forecasting with RNNs Time Series Forecasting with RNNs – Welcome to the next session of our article on Practical Machine Learning with TensorFlow 2.0. In this article, we will use RNNs to build models for time series data. In this particular exercise, we will build two models one for prediction of the univariate time series and second for predictions of multivariate time series. So, let us begin by importing necessary packages.

Here we are going to use a weather time series dataset which is recorded by Max Planck Institute for Biogeochemistry. The dataset contains 14 different features such as air temperature, atmospheric pressure and humidity. These were collected every ten minutes beginning in 2003, for efficiency we will use only the data collected between 2009 and 2016. This data set is prepared by Francois Chollet, we will download the dataset; dataset is in CSV file; we read the CSV data.

Let us look at top five entries of the dataset. See you can see that the observations are recorded every 10 minutes for a single hour we will have 6 observations. And in a day we will have 144 observations; 6 observation per hour and in 24 hours; we will get 144 observations. Given a specific time, let us say if you want to predict temperature for 6 hours in the future; in order to make this prediction we choose to use 5 days of observation.

Thus, you need to create a window containing last 720 observations to train the model. Many such configurations are possible making this dataset a good dataset for experimentation. So, if we take last 5 days of data; in a day there are 144 observations. So, we take 720 observations in the history and from this 720 observations; we will try to make next 36 observations because we want to predict temperature for 6 hours in the future and in every hour there are 6 observations. So, there are in all 36 observations that we want to predict.

The univariate_data function returns the windows of time for which the model is to be trained. It takes the parameter history size which is the size of the past window of information. Target size is how far in the future does the model need to learn to predict; the target size is the label that needs to be predicted. So, we define start and end index of the data and we perform some kind of a selection in the history to get the data and the corresponding labels. For this exercise we will use first 300000 rows of the data for training and remaining ones for validation.

So, this amounts to about 2100 days worth of training data. So, let us define the training split; we will set a seed to ensure that the results are reproducible. Let us begin with the first part that deals with forecasting a univariate time series. Here we will use a single variable called temperature and we will use it to make predictions for the temperature values in the future. We first extract the temperature information from the dataset.

Let us look at the temperature data; this is temperature data as recorded on different dates. We will normalize the features before training a neural network, we will use a z score normalization technique for normalization; for that we require to compute mean under standard deviation. And here we normalize the data by subtracting mean from every data point and dividing the resultant by the standard deviation. Now that we have normalized the data, we will create the data for univariate model. The model will be given last 20 recorded temperatures and it needs to predict the temperature at the next time.

So, we use univariate_data function that we define for extracting the training and the target data. So, we apply the same function to obtain training and validation datasets. So, let us look at what this particular function returns. So, this is a single window of past history having 20 values and you also get the target temperature to predict.

Now, that the data has been created; let us look at a single example. The information given to the network is in blue; so this is the information that is provided to the network and you want to predict the point marked by the red cross and this particular cross is the actual value. So, before proceeding to training a model; it is always a good idea to establish some kind of baseline. Here, the simple baseline could be looking at a historical data and predicting next point as the average of last 20 observations.

So, let us look at how this baseline performs; we simply np dot mean on the history. So, you can see that the model prediction on the base model is quite off from the two feature. Now, we will try to see if we can
do anything better than the base line. So, here we will train a recurrent neural network model which is suitable for handling sequential data or a time series data which is also an example of a sequence data. RNN process a time series step by step maintaining an internal state summarizing the information they have seen so far. Here we will use a LSTM models for modeling the time series data.

Before we begin we prepare dataset objects for training and validation. We will first shuffle the object then batch it and we cache the dataset. So, this is how data looks like after batching. So, for every time step we have number of features and this is a batch up example  and in the third axis you have features. Let us build a simple LSTM model which gives out 8 outputs and output of LSTM is passed through a dense layer with a single unit.

Note that we are not using any activation here because we are trying to solve a regression type of problem because you want to predict temperature which is real number; so regression problem makes a lot of sense here. We use adam as an optimizer and we are going to optimize the minimum absolute error as a loss. Let us make a sample prediction and check the output of the model.

So, let us send the model; now we will run the model only for 200 steps and we are going to have 10 EPOCHS; in each epoch you only train a model for 200 steps. Let us predict using simple LSTM model; you can see that in the first case the prediction is very close. Second case it is rightly off; third case it is quite close; so this looks better than the baseline. Now, that you see in the basics of how to train a RNN model for time series forecasting, we will move on to the next part where we will do time series forecasting for multivariate time series.

So, for the sake of this exercise; we will just select three of the 14 features this features are air temperature, atmospheric pressure and the air density. Let us look at how each of these features vary across time. So, you can see that this features are on different scale and they have different variability across time. Let us normalize the data using mean and standard deviation. Now, we will use a single step model, this model lost to predict a single point in the feature based on some history provided.

So, let us define the function for performing the window in task; here you will sample the past observations based on the step size that is given to us. So, in this exercise we will show last 5 days of data to the network; last 5 days of data will constitute 720 observations that are sampled every hour. The sampling is done every 1 hour; since drastic change is not expected within 60 minutes. Thus, 120 observation represent history of the last 5 days, for the single step prediction model the label for the data ; the label for a data point is the temperature 12 hours into the future. In order to create a label for this the temperature after 72 observations is used.

Note that we are having 6 observations per hour; so in 12 hours we will have total 72 observations; that is why we use temperature which is after 72 observations in the future. So, we use past history of 720 points and we want to predict the target in the future which is 72 observations away from the current point in the future. Let us create training and validation data sets; the shape of the train data of a single point. So, we have 120 points and each point has got three features. So, we construct data set object with from_tensor_slice and then we shuffle it, batch it and then cache the dataset; in case of validation we only batch the dataset.

We build the same sequential model with LSTM layer and a denser layer; use RMS prop as an optimizer and mean absolute error as the loss function. Let us check out the sample predictions; let us train the model. Let us look at how training progressed. So, the training loss is coming down; validation loss came down went up it is slightly a zigzag kind of pattern in the validation loss, not so smooth.

Now that the model is trained let us make a few sample predictions. The model is given the history of three features for the past 5 days sampled every hour which is 120 data points. Since the goal is to predict the temperature, we only display the past temperature in the plot and the prediction is made one day in the future. So, you can see for first two examples; for first example they are very close. Second example also are very close.

Third example they are very very close. So, you can see that using more features; you are able to predict the temperature quite accurately. In multi step model, instead of predicting just one value we will try to predict multiple values in the future. Now, let us think about how the training data looks like for this; can you stop the video for couple of minutes and think about it and then watch the video to check if your understanding is correct.

So, you can see that for the multi step model the training data consists of recording over the past five days sampled every hour which will be 120 points. However, the model needs to learn to predict the temperature for the next 12 hours; since an observation is taken every 10 minutes, will have 72 predictions over next 12 hours. So, our target has now 72 values; so to predict the 72 values from the last 120 values.

Let us create a data set, you can see that we have 120 points in the training set; each has 3 features and we want predict 72 different values in the future. We create the data set. So, the smooth line here is a history and the dots here are kind of predicted values. Since this task is a bit more complicated than the previous task; we will use two LSTM layers. Finally, the 72 predictions are made using a dense layer that outputs 72 predictions.

We are using RMS prop as an optimizer and we are using clip value as argument for clipping the gradient. We use mean absolute error as a loss function here; let us look at how the model predicts before it trains. Let us train the model; we will train the model for 200 steps in every epoch, ok. Now, that the model is trained; let us look at how the model training preceded. Let us predict how well the module has learn to predict the future.

So, you can see that the future; the true values in the future are in blue whereas, the model prediction is in red. You can see that in the first case it is able to predict the next values quite better. Also in the second case and third case there are some deviations, but model seems to be doing a decent job of predictions. You can see that the model is quite smooth it is not able to get into some of the nitty-gritty details like that there are some abrupt turns; model is not able to learn that perfectly, but overall it is doing a decent job. So, that was it from modeling time series with RNNs.

Wasn’t it fun? I am sure you will use this model to forecast futures in some interesting time series. Hope you enjoyed learning this concepts with us.