Channel: Simple Talk

↧

High Concurrency Data Pipelines in Fabric

January 8, 2025, 2:17 pm

≫ Next: Customizing the Export and import of Parquet Files in SQL Server

≪ Previous: A Use Case for Memory-Optimized Tempdb Metadata

Data Pipelines can orchestrate many activities, creating a flow for data ingestion. One of these activities is the notebook execution activity.

However, every time a data pipelines executes a notebook, it creates a completely new session and spark pool.

This makes the Data Pipeline very slow and expensive.

How bad it can be

Imagine your pipeline will run a notebook inside a loop. The loop executes the notebook many times.

Each execution means a completely new spark pool. This is expensive.

A screenshot of a computer

Description automatically generated

Besides being expensive, the default configurations for a spark session and a capacity will not support this running in parallel. You will need to limit the number of parallel notebook executions, using the ForEach activity, like in the image below

A screenshot of a computer

Description automatically generated

High Concurrency to the Rescue

The solution is to enable High Concurrency for Data Pipelines running notebooks. This can be done in two steps:

Enable this configuration in the workspace settings
Configure the session tag in the notebook activity

In the workspace settings, you find this option to be enabled in Spark Settings, like in the image below:

A screenshot of a computer

Description automatically generated

After that, the Session Tag configuration defines which notebook activities will use this feature or not. You can create groups of notebook activities running each group in a different session. You can use any string as “Session Tag”

A screenshot of a computer

Description automatically generated

The High Concurrency Results

The image below shows a comparison between the execution without high concurrency and with high concurrency.

The execution time dropped from almost 13 minutes to less than 3.

A screenshot of a computer

Description automatically generated

References

Fabric Monday 55: Pipelines High Concurrency to Save Yout Time and Money

Summary

If you plan to orchestrate notebooks using Data Pipelines, the High Concurrency configuration is essential for you

The post High Concurrency Data Pipelines in Fabric appeared first on Simple Talk.

↧

Trending Articles

RAMAYAMPET Mandal Sarpanch | Upa-Sarpanch | Ward member Mobile Numbers Medak...

May 24, 2017, 2:00 am

लड़कियां सेक्स के दौरान क्यों करती है उह! आह!लड़कियां सेक्स के दौरान क्यों करती...

May 19, 2016, 1:54 am

Neem Baba Extra Questions Answer Class 6 English Poorvi

February 1, 2025, 5:19 am

Throw Back: 4×4 — Sikilitele (Ft Castro) Prod by JQ

March 5, 2015, 8:24 am

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

August 20, 2016, 5:13 pm

Lowe faces four theft charges

November 14, 2017, 6:52 pm

Practice Sheet of Right form of verbs for HSC Students

September 22, 2019, 11:40 pm

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

December 7, 2016, 3:57 pm

The 10 Tennessee Cities With The Largest Black Population For 2021

December 21, 2020, 10:12 am

Materials Around Us Class 6 Worksheet Science Chapter 6

October 3, 2024, 5:20 am

デスクトップヒープの枯渇

January 18, 2018, 8:31 pm

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

March 7, 2020, 11:19 pm

Kanulanu Thaake Lyrics and translation | Manam (2014)

May 9, 2014, 5:45 am

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

May 30, 2025, 9:29 pm

Teen Shot In Miami Drive-By Dies From Injuries

August 8, 2011, 1:16 pm

Download: IQ Muzatasha feat Shy D & Pmj – Ulesi NiFertilizer Yamavuto

March 22, 2018, 7:23 pm

Mahakal Attitude Status

February 29, 2020, 9:52 am

Property developer set up cannabis factory to help pay off debts...

August 3, 2015, 2:29 am

♡

July 11, 2015, 6:15 am

KB: How to troubleshoot issues when adding a Hyper-V host in System Center...

August 14, 2012, 10:05 am

© 2026 //www.rssing.com