Multivariate Time Series Forecasting in Python

supriyamalla
Mar 6, 2023
3 min read

Updated: Mar 14, 2023

Are you tired of making business decisions based on guesses and assumptions? Do you wish you could have a crystal ball that could accurately predict future trends and values? Well, while we may not have a mystical crystal ball, we do have something just as powerful – time series forecasting.

Recently, I came across a problem where there was a need to forecast sales quantity for each store and product with each store/plant selling multiple products.

Here is a quick description of the sales data:

1. 40 weeks of data from July 2021 to April 2022

2. 11 stores/plants

3. 50 products per store

Objective: Forecast sales for individual store/plant and product for the next 12 weeks.

Challenge: Running an iterative for loop for 50*11=550 combinations for 40 weeks is a makeshift solution but not an efficient one. What if we want to run different ML models to train on each combination?

Solution: Is there an easy way to do this? Fortunately, yes, using Scalecast library. It is easy to use and it provides a unified interface to quickly prototype and test different models with minimal code.

Michael Keith’s posts on Scalecast library (he is the creator of Scalecast) and its implementation were super helpful! A lot of the content that I am sharing below are inspired from his posts.

In this blog post, we will further explore the Scalecast library and its capabilities for time series forecasting.

Here are the steps to get started:

1. Prepare data to eliminate any missing values

This is the first step in any time series forecasting project. This involves loading the data, checking for missing values, and converting the data to a time series format. I recommend removing spaces from column names (not a necessity but to avoid any conflict).

2. Create a Forecaster object which takes into account all variables

input_data={} #creating an empty dictionary
for plant in data.PlantID.unique():
    for mat in data.Material_Code.unique():
        data_slice = data.loc[(data['PlantID'] == plant)&    (data['Material_Code'] == mat)]
        # for missing weeks, assume 0
        load_dates = pd.to_datetime(data_slice.Date)
        data_load = pd.DataFrame({'Date':load_dates})
        data_load['Vol'] = data_load.merge(data_slice,how='left',on='Date')['Sales_Qty'].values
f=Forecaster(y=data_load['Vol'],current_dates=data_load['Date'],PlantID=plant,Material_Code=mat) #add additional variables if you have
        input_data[f"{plant}-{mat}"] = f

Explanation for each line of code snippet (click on chevron icon to expand):

3. Download template validation grids which have predefined code for the models you want to run

models = ('mlr','elasticnet','knn','rf','gbt','xgboost','mlp')
GridGenerator.get_example_grids()
GridGenerator.get_mv_grids()

Explanation for each line of code snippet (click on chevron icon to expand):

4. Using forecaster object generate future dates and iterate over individual store-product combinations

for k, f in input_data.items(): #k,f are iterators of the dictionary
   f.generate_future_dates(12) #Predict for the next 12 weeks
   f.set_test_length(10)
   f.set_validation_length(5)
   f.add_ar_terms(3)
   f.add_AR_terms((1,10))
   if not f.adf_test(): # returns True if it thinks it's stationary, False otherwise
       f.diff()
   f.add_seasonal_regressors('week','month','quarter',raw=False,sincos=True)
   f.add_seasonal_regressors('year')
   f.add_time_trend()

Explanation for each line of code snippet (click on chevron icon to expand):

5. Run pre-defined models over individual plant-product combinations

from tqdm.notebook import tqdm as log_progress
models = ('mlr','knn','svr','xgboost','gbt','elasticnet','mlp', 'rf') 
for k, f in log_progress(input_data.items()): 
#k, f are the iterators of input_data dictionary
   for m in models:
       
       f.set_estimator(m)
       f.tune() # by default, will pull the grid with the same name as the estimator (mlr will pull the mlr grid, etc.)
       f.auto_forecast()

    # combine models and run manually specified models of other varieties
   f.set_estimator('combo')
   f.manual_forecast(how='weighted',models=models,determine_best_by='ValidationMetricValue',call_me='weighted')
   f.manual_forecast(how='simple',models='top_5',determine_best_by='ValidationMetricValue',call_me='avg')

Explanation for each line of code snippet(click on chevron icon to expand):

6. Finally extract your data!

forecast_info = pd.DataFrame()
forecast_info1=pd.DataFrame()
for k, f in input_data.items():
    
    
   df = f.export(dfs=['lvl_fcsts'],determine_best_by='LevelTestSetMAPE')
   df1 = f.export(dfs=['model_summaries'],determine_best_by='LevelTestSetMAPE')    
   df['Name'] = k
   df['Plant'] = f.PlantID
   df['Material Code'] = f.Material_Code
    
   df1['Name'] = k
   df1['Plant'] = f.PlantID
   df1['Material Code'] = f.Material_Code
    
   forecast_info = pd.concat([forecast_info,df],ignore_index=True)
   forecast_info1 = pd.concat([forecast_info1,df1],ignore_index=True)
    
writer = pd.ExcelWriter('model_summaries.xlsx')
forecast_info.to_excel(writer,sheet_name='Sheet1',index=False)
forecast_info1.to_excel(writer,sheet_name='Sheet2',index=False)
writer.save()

Explanation for each line of code snippet (click on chevron icon to expand):

That’s pretty much it! The final file that you extract will have both model summaries (having the best model stats) and the final forecasted values!

Find the complete notebook here.

And done!

Multivariate Time Series Forecasting in Python

Recent Posts

Comments

Subscribe Form