Showcase

Pipeline Example

Backgroundggg

Erma's Original Frozen Custard is a walk-up frozen custard establishement in SE Michigan. The stores are located in Saint Clair Shores, Shelby Twp. and Warren. The stores use Upserve for POS, Shopfiy for online orders and DoorDash for delivery. These systems have been integrated with weather data from Open Weather Map to predict the sales in order to properly staff the stores. Additionally, a reporting pipline has been created which pushes the aforementioned data up to Qlik Cloud using the Qlik DataTransfer tool. The sections below will describe the design elements of these pipelines and the various applications that are used to facilitate.

Note 1: The pipeline is being updated with the addition of a new POS. Those changes will be reflected when complete.

Note 2: All software descibed herein is property of Erma's Frozen Custard, Inc. and can be made available upon request.


Process Flow

The diagram below shows the high level view of the pipelines and will be used for discussion through this post. The numbers are the sections as referenced below in the detailed descriptions.




API Client Applications (Section 1)

The API client applications that extract data from the cloud sources are all .NET Core console application written in C# and triggered with task scheduler. The use RestSharp and NewtonSoft to retrieve the data from the source systems. The JSON retrieved is parsed and inserted into tables that were set up to reflect the likley schema used in the source system (mostly normalized). All of the solutions use Log4Net for logging and all secrets are encrypted and retrieved at runtime.

Model Builder (Section 2)

The intent of the modelling is to predict sales as function of weather and any other temporal predictors available. The application extracts data stored by the API client applications described in the previous step as well CSV files from DoorDash / historical sales. The CSV files must be manually extracted since we currently do not have access to DoorDash backend via an API at this time. The modeller takes the database and CSV sales data and the database weather data performing typical preprocessing (missing value handling, reshaping, combining, aggregating, standardizing, splitting). Once the predictor and label frames are created for both the training and test sets a gradient boost regressor, random forest regressor, and KNN model are all fit using the best hyper-parameters from an exhaustive grid search. Each of the models is then used to make predictions with the test set and the resultant RMSE is calculated. The model with the best RMSE is then pickled and sent off to be used in the prediction web site which is described further down on this page.

The model builder application is written in Python 3.8 (scikit-learn / Pandas / Numpy) and was developed in Visual Studio 2022. The solution implements a singleton debugging logger, uses a config file for modifications to settings, implements multi-threading for data extraction / modelling tasks and follows single responsibility design patterns. The repo for the modelling solution is not public since it is intellectual property.

The logic flow provides a detailed summary of the program execution in the modelling solution.

Predictions (Section 3)

The predictions section consists of a Python solution (Engine) that actually makes the predictions, a transaction broker SQL server, a .NET Core Web App (UI) that gathers user input and displays the prediction results and a cloned Open Weather Map client that hits the OWM API and stores weather data in SQL. These four elements will be discussed in turn below. But first take a look at the ladder diagram that encapsulates the system logic quite well. The weather data collection is not included as this is a simple CRON job that happens every hour as provoked by Windows Task Scheduler.

The Web App is written in C# and was developed in Visual Studio 2022. The application is responsible for handling authentication, getting the store and date input from the user, passing that request to the SQL server transaction broker, retriving the reply from the broker and posting the results to the view.

The SQL server transaction broker handles the authentication persistence, stores the requested jobs and completed jobs. There are a few stored procedures to make tasks a bit easier but overall the database is very simple.

The Python Prediction Engine gathers the requests from the SQL database, spins up a thread for each request, gets the aggregated weather data for that day and uses the pickled model to make a sales prediction for that day. Finally the results are written back to SQL for consumption by the UI Web App.

The weather collection client is identical to the one used in Section 1.

Reporting (Section 4)

The reporting section is responsible for moving data from the SQL server storing the results of the Section 1 client query results to a reporting database and then up to Qlik Cloud. The detailed steps are as follows:

  • SQLBackUpAndFTP is used to back up the databases to the local machine. This job runs nightly and is configured to send and email should an error occur.
  • A custom .NET Framework console application written in C# ensures only the latest 10 backups are present on the SQL server host machine and moves the latest to on premise file server.
  • Another CRON .NET Core console application written in C# gets a latest copy of each back up from the file server, decrypts / unzips the compressed file using the 7za.exe (7zip application), and restores the backups to a SQL server instance on another host machine.
  • Qlik DataTransfer application is used to transfer data from the local server to Qlik Cloud tenant.
  • Qlik Cloud is then used to build visualizations such as the example shown here which displays the yearly summary report.