Near Real-Time Data Export from Business Central
As someone with experience in ERP systems and business intelligence, I am always interested in learning about new ways to extract and analyze data from these systems. In this blog post, I am excited to share my experience with using Azure Synapse Analytics and Azure Data Lakes to load data from Business Central (BC) into Power BI. I hope you are as excited as I am to read about this fantastic blend of technologies.
Every day, Every minute, Fresh data from Business Central.
Cloud-enabled near real-time data export—Isn’t that the dream?!
I recently discovered an amazing open-source extension for Business Central that let us seamlessly synchronize our data to an Azure Data Lake.
Post 1 of 3 - Blog series - Near-Real-Time Data Reporting on Business Central
Involved components:
Business Central (Navision in the cloud)
Azure Data Lakes
Azure Synapse Analytics (Pipelines, Mapping Data Flows, Spark Notebooks and Serverless SQL)
Power BI
bc2adls
Abbreviations:
BC - Business Central
ADLS - Azure Data Lake Storage
ERP - Enterprise Resource Planning
DB - Database
Glossary:
Delta loads: Incremental loads of only changes since the last time loaded. Should include new, changed and deleted rows/entries.
Batch: Jobs or activities than runs for multiple rows/entries at a given interval, sometimes called mini-batches for near-real-time.
Near-real-time: A made-up word for small batches to close the gab between developers and business users who think 1 hour intervals is real-time. My own definition is everything in batches of 1 hour or more frequently, but not actually being streaming data.
Special mention: bc2adls tool
This tool is a quintessential piece in the overall solution and really handles the raw ingestion from Business Central to Azure Data Lake Storage (BC 2 ADLS) part extremely well. Which is also why it deserves it’s own mention with due credits for Henri Schulte and Soumya Dutta 👏
Check it out here: bc2adls Github
Here are some of the reasons I highly recommend this tool:
Open source!
Near-real-time (with intervals as low as 1 minute).
Delta loads everything
Utilizing Read Scale-Out, read-only replica database for Business Central versions from 2021 an onwards
Easy to set-up and configure
Security: Data is only pushed out of BC, so no downstream credentials stored. Utilizes Service Principal for the Data Lake authentication
We’ll cover more in-depth how to set up and install the tool, later, but to begin I believe it is important to understand what it does. The tool is two-fold, with a Business Central and an Azure Synapse Analytics component all interfacing over data stored in the Azure Data Lake.
Delta load into Azure Data Lake
How it works
How to set it up
Consolidating deltas into a “current” form (Incl. Serving data as Lake Databases)
How it works
How to set it up
Delta load from BC into Azure Data Lake
How it works
The first step is to set up the extension on the BC tenant, when it is set-up it will handle much of the data loading on its own. For code-savvy readers who knows AL code, the extension is open-source so you can check out its source code in the Github repository mentioned above. For the rest of us here is what we should understand.
After the extension is installed, configured, enabled and automated it will run a batch job at the given interval. During this batch job it checks the tables for any deltas (new, changed or deleted rows).
First it checks for rows with a timestamp later than the latest exported timestamp and if one or more of those exists it exports those rows as a new .csv file that is landed in the data lake in the main folder “deltas”, and the table-specific subfolder (as tried illustrated below).
Note: Running it for the first time this is of course a full-load and may be time-costly to run.
(Should you ever need to re-do the full-load, there is a slightly hidden reset feature in the extensions)
Furthermore, it checks the system logs for any deleted rows and creates a synthetic delete-entry as a .csv too and place that in the same folder. This is a very neat, important and highly needed for our subsequent transformations jobs to know that the entry was in fact deleted. Handling deleted rows is in my experience an often overlooked but essential component in delta loads, and getting that “out-of-the-box” here is highly appreciated!
How to set it up
Setting up Azure
We need the following to set up the connection in Business Central:
- Data Lake (Storage Account gen2 with hierarchical namespace enabled)
- Service principal account with Blob Storage Data Contributor RBAC role for the Data Lake.
Here are the steps:
Set up an Azure Storage Account
https://docs.microsoft.com/en-us/azure/storage/blobs/create-data-lake-storage-account
Save for later:
- Storage account nameCreate Azure Service Principal (App registration)
Create app registration
https://docs.microsoft.com/en-us/azure/active-directory/develop/quickstart-register-app#register-an-applicationSet up app secret
https://docs.microsoft.com/en-us/azure/active-directory/develop/howto-create-service-principal-portal#authentication-two-optionsAdd the following redirect URI: https://businesscentral.dynamics.com/OAuthLanding.htm
Save for later:
- Application (client ID)
- Directory (tenant) ID
- Secret value (Notice: you can only copy this upon creating the secret)Add a role assignment in the Storage Account to the Service Principal account for the role Data Blob Storage Contributor
Installing the ADLS extension on Business Central
This part is definitely the part where I have the least expertise, so if you decide to test the whole thing by spinning up your own demo sandbox/prod environments, feel free to follow my steps laid out below.
But if not, then I highly recommend that you reach out to your local BC/AL developer for this part and let them do their thing following the overall documentation from the solutions repo:
Install the extension into BC using the code given in the businessCentral folder using the general guidance for developing extensions in Visual Studio code.
This is what worked for me. Again, not a BC developer.
Setting up a Business Central Sandbox
Setting up a Business Central Prod
Prepare Visual Studio Code (VSC)
Download and install VSC
Install the plugin “AL Language”
Prepare the file
Download the bc2adls .zip file from the GitHub repo
Unzip it
Build .app file
Go to Visual Studio Code
Open the “business central” folder in the bc2adls unzipped folder (bc2adls/bc2adls/business central)
Start “Build” (Shortcut: CTRL + SHIFT + B)
When prompted choose “Microsoft Cloud Sandbox”
Connect and Authenticate
Install “Symbols” if prompted
Retry “Build” if first attempt failed 😅
Install the extension
Go to Business Central
Go to “Extensions”, Click “Management”
Click “Upload a file” and locate the .app file in the bc2adls folder (unzipped)
Configuring the extension
Locate the extension page by searching for “Export to Azure Data Lake” or go to page 82560
Add info to Tenant, Storage Account, Container, App ID and Secret value
(The Data Lake Container will be created automatically if it doesn’t exists)
Add tables to the list view below
Choose fields for each tables (otherwise nothing is exported)
Click “Export” to manually start the first full-load batch, and test that the connection works.
Scheduling a recurring export job
Inside the extension config, click “Schedule export”
Chose start date and click OK
Configure Job for recurrence
Go to “Job Queue Entries”
Pause the job
Enable recurring run by choosing weekdays
Set no. of minutes (1440 for daily, 60 for hourly, 1 for every minute)
That’s it! Check out your data lake for these files to verify the export is working
a delta and a data manifest file
a “deltas” folder, with .csv files
A cdm.json file for each of your exported tables
Consolidating the deltas into its “current” form
Once you have the delta files they need to be consolidated into a shape that reflects how Business Central looks currently, meaning we need to overwrite any former values with the new changes, add any new rows and delete any rows that was deleted since the previouos consolidation.
Notice that consolidation and delta intervals are asynchronous by design and need not to run by the same interval. Hopefully you’ll later learn why it can be highly advantageous and even cost-efficient to run the deltas much more frequent that the consolidation pipelines.
Here are three implementations to this problem:
(Simplest) Never consolidate deltas, just add a Serverless SQL abstraction layer on top or load the data directly into Power BI
(Standard bc2adls design) Run the consolidation job every time you wish to update your data load
(Hybrid, best of both worlds) Run consolidation jobs much less frequently to clear up performance lag, then consolidate delta and data either in Power BI or Serverless SQL
In a soon-to-be-published second blog post I intend to follow up by discussing key components of each approach, as well as their overall architecture as well a pro/con evaluation and comparison.
Until then, feel free to play around with this Power BI template file or M-code for Solution 1 - Simplest.
It loads a tables entire list of deltas as converts them into the “Current” state of the table. Maybe you’ll find the limitations to this solution yourself, or lucky enough that it works out fine for your use-case!
Disclaimer, this needs all deltas including the initial full-load to work, so if you ever ran the consolidation batch job and checked “Delete deltas” or if your tables are simply too large it will not work for now. (But stay tuned!)
Power BI Template file
M-code:
let
// Load deltas
DeltaSource = Cdm.Contents(AzureStorage.DataLake("https://"&StorageAccount&".dfs.core.windows.net/"&Container&"/", [HierarchicalNavigation = true])),
deltas.manifest.cdm.json = DeltaSource{[Id="deltas.manifest.cdm.json"]}[Data],
#"TableDelta" = deltas.manifest.cdm.json{[Name=Table]}[Data],
#"AddedTypeDelta" = Table.AddColumn(#"TableDelta", "LoadType", each "Delta"),
//Deduplicate, based on newest timestamp
#"Sorted Rows" = Table.Buffer(Table.Sort(#"AddedTypeDelta",{{"timestamp-0", Order.Descending}})),
#"Removed Duplicates" = Table.Distinct(#"Sorted Rows", {"systemId-2000000000"}),
#"Filtered Rows" = Table.SelectRows(#"Removed Duplicates", each ([#"SystemModifiedAt-2000000003"] <> null))
in
#"Filtered Rows"