Databricks H1 2025: Key Feature Announcements and Where They Stand Today

MirageAI

Services

Company

Careers

Select Language

Databricks H1 2025: Key Feature Announcements and Where They Stand Today

The first half of 2025 has been a transformative period for Databricks, with the company rolling out a series of major feature announcements that signal its ambition to dominate the modern data stack. From serverless capabilities to AI-powered analytics, these releases showcase both evolutionary improvements and revolutionary new approaches to data engineering and analytics.

While some competitors focus on individual components, Databricks is clearly betting on an integrated ecosystem approach. The question for data teams is whether these new capabilities live up to their promise - and which ones are ready for production workloads today.

This analysis covers the key announcements from the first half of the year - with a particular focus on this year’s Data + AI Summit, which saw the highest concentration of updates (both the widely publicized ones and those that were quietly released, such as enhancements to Auto Loader, the new Google Sheets connector, serverless workspaces and dbt Cloud task in the Workflow).

I'll also cover the current state of these features: what’s available today, what’s working well, and what’s still evolving.

Free SAP-Databricks readiness assessment

Lakeflow Connect – Unified Data Ingestion

Perhaps the most comprehensive announcement was Lakeflow Connect, representing Databricks' attempt to standardize data ingestion across the entire ecosystem. This isn't just about adding more connectors - it's about creating a unified experience that eliminates the complexity of managing multiple data pipelines.

Lakeflow Connect is Databricks' unified ecosystem for ingesting data from virtually any source - cloud storage, databases, SaaS applications, or existing Delta tables.

It consists of three main components:

Auto Loader
Declarative Pipelines
Managed Connectors

Auto Loader: Key feature enhancements include:

Easier ingestion with new file events: No need to worry about creating separate queues for separate streams.

Credit: Databricks

File Lifecycle Management: You can now automatically archive or delete source files based on a defined retention period
Direct connection to SFTP location
Two new data formats: XML and Excel
Schema Evolution: Add new columns with type widening

Declarative pipelines:

Delta Live Tables (DLT) has been rebranded under Lakeflow as Declarative Pipelines, now positioned as a core capability in the Databricks ecosystem.

Key new features include:

Improved UI/ IDE designed for data engineering workflows
Multi Schema Support
Enzyme for materialized views
Seamless integration with Managed Connectors
Auto CDC replaces Apply_Changes for SCD1 or SCD2 operations
Trigger on update for materialized views instead of manually scheduling

Credit: Databricks

A Lakeflow No-Code Designer was showcased in the keynote, allowing drag-and-drop pipeline creation with AI assistance. However, this feature has not yet been released.

Lakeflow Connect Managed Connectors

You can now easily connect to a wide range of data sources using Lakeflow Connect managed connectors - whether they’re relational databases like SQL Server and PostgreSQL, or SaaS platforms like Salesforce. Some of these managed connectors already support advanced features such as Slowly Changing Dimensions (SCD Type 1 and Type 2) out of the box.

A number of connectors are generally available, while others are in private preview or part of the upcoming roadmap.

Credit: Databricks

Looking Ahead: What’s Coming

Query Pushdown: Push filtering and transformation logic to the source system to reduce data movement and significantly improve performance.
Row Level Filtering & Column Masking: Apply fine grained data access controls in flight to protect sensitive information like PII.
Expanded Connector Support: New connectors for platforms like Snowflake, Amazon Redshift, MySQL, and Google BigQuery are in the pipeline.

Lakebase – Serverless Postgres with OLTP/OLAP

Moving beyond data ingestion, Databricks made a bold move into the database space with Lakebase. Lakebase is the result of Databricks’ acquisition of Neon. Databricks has introduced a fully managed serverless Postgres database that combines both OLTP and OLAP capabilities in a single system. It features separated storage and compute, enabling high scalability and performance.

To address the well-known latency issues with object storage like S3 and Azure Blob, Databricks added a middle caching layer that stores soft state. This allows queries to bypass slow storage paths and hit the cache directly bringing latency down from around 100ms to as low as 10ms.

Features not supported in Lakebase yet:

Serverless Compute: Cost monitoring is essential as currently, compute doesn’t automatically scale down to zero when not in use.
Branching (Copy on Write): Enables instant, storage efficient clones of your database. You don't incur storage costs until you start making changes. This is ideal for experimentation and testing.

A screenshot of a computer

AI-generated content may be incorrect., Picture

Credit: Databricks

Databricks SQL

On the SQL front, Databricks continues to bridge the gap between traditional data warehousing and modern analytics. The additions here are particularly relevant for teams migrating from legacy systems or those who prefer SQL-centric workflows.

Key additions include:

New AI functions like AI_Extract, Vector_Search, AI_Query
Stored Procs: Great for people comfortable with legacy warehousing. Much more powerful than UDF’s.
Multi-transaction statements supporting Optimistic and Row Level Concurrency.

Credit: Databricks

Databricks AI/BI Dashboards

The AI/BI dashboard improvements show Databricks' commitment to democratizing data analytics. These features are designed to reduce the technical barrier for business users while still providing the depth that analysts need.

AI Forecasting (Beta): Add predictive insights directly into your dashboards using integrated forecasting models.
Top Driver Analysis (Private Preview): Automatically surface the key factors influencing trends and anomalies in your data.
Genie Deep Research: Scheduled for release later this summer, Genie will bring deep contextual insights and guided analysis into BI workflows.
Drill-Through Capability (Private Preview):Enables users to explore underlying data details interactively.

BI and Partner Tools

One of the most practical aspects of this year’s Databricks Data + AI Summit was the focus on integration with existing tools. Rather than forcing users to abandon their current workflows, Databricks is meeting them where they are - in Excel, Google Sheets, Power BI, and Tableau.

Run dbt Cloud Jobs in Workflows (Private Preview): Integrate dbt jobs directly into Databricks Workflows for end-to-end orchestration.
Power BI Task in Workflows: Launched earlier this year, allowing automated Power BI report refreshes as part of Databricks workflows.
Simplified Tableau Cloud Connectivity: Easily connect to Tableau Cloud in just a few steps with updated guidance and tooling.
Arrow Database Connectivity for Power BI (Coming Soon): Promises faster, more efficient data transfer for large models and real-time dashboards.
Databricks Connector for Google Sheets: Easily pull live data from Databricks into Google Sheets for collaborative analysis and reporting.

For me, the Databricks Connector for Google Sheets was one of the standout announcements. It makes it significantly easier for business users to access and work with data directly without needing to rely on complex tooling or technical teams.

And hold on there’s more: a connector for Microsoft Excel is also coming soon, which will further empower business users by bringing Databricks data into their most familiar tools.

Databricks Free Version and Serverless Workspaces

Drum roll please! Databricks now offers a free version for everyone. This is a significant upgrade from the previous trial offering, which was limited to 14 days and $40 in credits - something I often exhausted within just a couple of days. For comparison, Snowflake offers a 30-day trial with $400 credits.

This was perhaps the most democratizing announcement, as it introduced a truly free tier. And this isn't just a marketing move - it's a strategic play to get Databricks into the hands of individual developers and small teams who might become enterprise customers later.

What stood out during setup was how frictionless the experience has become. With Serverless Workspaces, I no longer have to deal with setting up credentials or configuring external storage locations. Everything is handled behind the scenes.

LakeBridge – Legacy ETL Migration

For organizations with significant legacy ETL investments, LakeBridge represents a potential game-changer. Rather than forcing a complete rewrite, it promises to automate much of the migration process - though as with any automated translation tool, the results will likely require human oversight.

LakeBridge is the result of Databricks’ acquisition of Bladebridge, now rebranded and integrated into their ecosystem.

A screenshot of a web page

AI-generated content may be incorrect., Picture

Credit: Databricks

It’s designed to analyze and convert legacy ETL code into modern PySpark or SQL, depending on the source language. It also includes validation capabilities to ensure the translated logic behaves as expected. Databricks claims conversion accuracy of 70% and is actively working on improving it.

Currently, LakeBridge is a CLI-based tool, but Databricks is actively working on a UI based version to make it more accessible and user friendly.

Managed Iceberg Tables

In the ongoing format wars between Delta Lake and Apache Iceberg, Databricks is taking a pragmatic approach. Rather than forcing users to choose, they're providing interoperability - a smart move that reduces switching costs and vendor lock-in concerns.

This feature brings true interoperability to Databricks.

Managed Iceberg Tables generate metadata compatible with both Delta and Iceberg formats, enabling you to access your data seamlessly across multiple query engines.
If you're using other Iceberg compatible systems, you can now read/write to the same tables, without complex conversions.

Metrics View

The concept of a semantic layer isn't new, but Databricks' implementation appears to focus on simplicity and business user accessibility. This could be particularly valuable for organizations struggling with metric consistency across different tools and teams.

Metrics View serves as a logical layer defined on top of your datasets, where each metric or KPI is centrally defined and managed. The beauty of it is that business users can also create and understand metrics without needing to write complex queries.

This becomes your single source of truth for analytics, ensuring consistency across dashboards, reports, and teams. Overall, it’s a powerful addition for driving trustworthy, self-serve BI.

Few gotchas with Metrics View:

Currently, it doesn’t have full integration with all the BI tools. It works with Omni and Databricks AI/BI. You can also leverage API’s to fetch the metrics.

Other notable feature announcements include: