Some honest initial thoughts on Day 1 of exploring MS Fabric vs Databricks. Today, let’s review Fabric’s pricing model.
Positive
1) After digging into the pricing model, including the concepts of bursting and smoothing, I understand why some businesses would appreciate the peace of mind this model brings.
With just one click, they can fully control their compute spending and have a mostly predictable bill.
I also think the concept of bursting and smoothing is a good one.
2) For companies with a limited engineering bench OR running relatively small workloads OR one relatively sizeable workload a day, it makes sense as to why Fabric's pricing model appeals to them.
Even though Databricks' Serverless is solid, it has no cost controls. Fabric's pricing model is its own cost control mechanism, and cost controls matter in the current market.
Negative
1) It is always on, even if you are not using it.
Databricks and similar platforms have what is called "auto-suspend", meaning you are not paying for compute that you are not using.
Unlike the pausing that you can do in Fabric, restarting the compute in Databricks happens automatically.
You could develop some automation logic for handling restarts outside of Fabric, but, 50% of the draw to Fabric is simplicity without deep engineering expertise, but automation logic requires deeper technical expertise.
Point is, you are always paying, and extremely likely, wasting money by underutilizing your compute.
2) Need more fire power? You will have to double your spend.
From past experience with Azure products, this is a an EXTREMELY dangerous pricing model.
Especially in less data-engineering bench heavy organizations, I can almost guarantee that the most common answer to performance problems is going to be "let's throw more compute at this".
Instead of getting the marginal amount of compute needed (beyond bursting), your compute bill will effectively double, growing at a rate much faster than your data and needs.
This is not a problem in Databricks, where you have multiple types of compute that grow proportionally with your needs.
Closing Thoughts
1) It is my belief that from a cost perspective, Databricks is a better value, especially the bigger the data is. I base this on my experience running complex pipelines that cost a little under the price of an F4, but would have easily been a few tiers higher in Fabric.
2) That said, pricing simplicity is certainly valuable, and I understand why smaller sized businesses would find Fabric attractive in this regard.
I hope you found this useful!
Helpful Links
You can find all of the articles in this series at:
Fabrics vs Databricks Conclusion
Also, would love for you to follow me on LinkedIn as well @ in/JosueBogran and Youtube @ JosueBogranChannel.
I'm curious, about the always on part. I have worked with databricks long enough to know how that works. https://learn.microsoft.com/en-us/fabric/enterprise/pause-resume Azure also post some stuff about Pause and Resume. Is it always on? Or is it more that doesn't have the automation around it and you'd need to do it manually or hit up the API?