Blog - DriveDataScience

Lakehouse vs Warehouse in Microsoft Fabric: When to Use Which, What Languages Work Where, and Real-World Scenario Guide

Leave a Comment / Azure, Data Engineering

The definitive Lakehouse vs Warehouse guide for Microsoft Fabric. Side-by-side comparison across 17 features, the SQL analytics endpoint explained (why read-only), languages and interfaces matrix (PySpark, SparkSQL, T-SQL — what works where), read vs write capabilities table, security model differences, five real-world scenarios (e-commerce ETL, financial reporting, IoT, Customer 360 with ML, self-service analytics), the recommended Medallion pattern (Lakehouse for Bronze/Silver, Warehouse for Gold), cross-database queries, and migration guide from Synapse/Databricks.

Lakehouse vs Warehouse in Microsoft Fabric: When to Use Which, What Languages Work Where, and Real-World Scenario Guide Read More »

SQL Normalization and Star Schema: 1NF, 2NF, 3NF, Dimensional Modeling, and Designing Databases Like a Data Engineer

Leave a Comment / Data Engineering, SQL

Database design from both sides. Normalization: 1NF (atomic values), 2NF (no partial dependencies), 3NF (no transitive dependencies) with real examples. Dimensional modeling: star schema with fact tables (measures) and dimension tables (context), snowflake schema, star vs snowflake comparison, surrogate vs natural keys, junk/degenerate/role-playing dimensions, complete star schema SQL, and how it maps to our Medallion Architecture blog posts.

SQL Normalization and Star Schema: 1NF, 2NF, 3NF, Dimensional Modeling, and Designing Databases Like a Data Engineer Read More »

SQL Stored Procedures, Functions, and Triggers: Reusable SQL Logic, Automation, and When to Use Each

Leave a Comment / Data Engineering, SQL

Automate SQL with stored procedures, functions, and triggers. Procedures with input/output parameters, TRY/CATCH error handling, our pipeline logging procedure. Scalar functions and table-valued functions with use cases. AFTER triggers for audit logging, INSTEAD OF triggers for soft deletes, inserted/deleted tables. Procedures vs functions comparison, three real-world patterns, and trigger best practices.

SQL Stored Procedures, Functions, and Triggers: Reusable SQL Logic, Automation, and When to Use Each Read More »

SQL Views, Temp Tables, Table Variables, and CTEs: When to Use Which and Why

Leave a Comment / Data Engineering, SQL

Four intermediate storage options compared. Views (saved queries, security layer, updatable), temp tables (session-level, indexable, large data), table variables (batch-level, small data, no statistics), CTEs (single query, readability). Complete comparison table, decision tree, materialized/indexed views, three production patterns (staging with temp table, security with views, CTE for readability), and the table variable statistics trap.

SQL Views, Temp Tables, Table Variables, and CTEs: When to Use Which and Why Read More »

SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them

Leave a Comment / Data Engineering, SQL

Master SQL performance with the book index analogy. Table scan vs index seek, clustered vs non-clustered indexes, composite indexes with leftmost prefix rule, covering indexes with INCLUDE, reading execution plans, five common slow query patterns with fixes (missing index, function on column, leading wildcard, implicit conversion, SELECT *), index fragmentation and rebuild, and the index design checklist.

SQL Indexes and Execution Plans: How Databases Find Data, Why Queries Are Slow, and How to Fix Them Read More »

SQL DDL, DML, and Constraints: CREATE, ALTER, DROP, INSERT, UPDATE, DELETE, MERGE, and Database Design Fundamentals

Leave a Comment / Data Engineering, SQL

The complete DDL, DML, and constraints guide. CREATE TABLE with all data types, every constraint explained (PK, FK, UNIQUE, CHECK, DEFAULT, NOT NULL, composite keys), CASCADE options, ALTER TABLE, DROP vs TRUNCATE vs DELETE comparison, INSERT (single, multi, from SELECT), UPDATE with joins, DELETE with subqueries, MERGE for upsert with three clauses, IDENTITY auto-increment, SELECT INTO, and schema design best practices.

SQL DDL, DML, and Constraints: CREATE, ALTER, DROP, INSERT, UPDATE, DELETE, MERGE, and Database Design Fundamentals Read More »

SQL Functions Every Data Engineer Must Know: String, Date, Numeric, Null Handling, Conversion, and Conditional Functions

Leave a Comment / Data Engineering, SQL

The complete SQL function reference for data engineers. 50+ functions organized by category: string (TRIM, CONCAT, REPLACE, SUBSTRING, STUFF, STRING_AGG), date (DATEDIFF, DATEADD, DATEPART, EOMONTH, FORMAT), numeric (ROUND, CEILING, FLOOR, ABS, MOD), null handling (COALESCE, NULLIF, ISNULL, IS DISTINCT FROM), conversion (CAST, TRY_CAST, CONVERT), and conditional (IIF, CASE). Includes SQL Server vs PostgreSQL vs MySQL comparison table and complete data cleaning pipeline.

SQL Functions Every Data Engineer Must Know: String, Date, Numeric, Null Handling, Conversion, and Conditional Functions Read More »

Fabric Data Factory: Activities, Pipelines, Dataflow Gen2, Notebooks, and Building Production ETL in Microsoft Fabric

Leave a Comment / Azure, Data Engineering

The complete Fabric Data Factory guide. What changed from ADF (no datasets, connections instead of linked services, Dataflow Gen2 instead of Mapping Data Flows). All pipeline activities listed: data movement, transformation, control flow, notification (Teams, Outlook — NEW), and Fabric-specific (Semantic Model Refresh). Three complete pipeline examples including metadata-driven load and full Medallion ETL combining Copy + Dataflow Gen2 + Notebook + Power BI Refresh + Teams notification.

Fabric Data Factory: Activities, Pipelines, Dataflow Gen2, Notebooks, and Building Production ETL in Microsoft Fabric Read More »

OneLake Shortcuts in Microsoft Fabric: Every Source, Every Permission, and How to Access Data Without Copying It

Leave a Comment / Azure, Data Engineering

Master OneLake shortcuts in Fabric. Every supported source (ADLS Gen2, S3, S3-compatible, GCS, Dataverse, on-premises, Iceberg), read/write/delete behavior per source, the delete trap explained, two-layer security model, authentication methods per source, shortcut caching for cross-cloud cost savings, chained shortcuts, Direct Lake with shortcuts for Power BI, trusted workspace access for private ADLS, four real-world patterns, and step-by-step creation guide.

OneLake Shortcuts in Microsoft Fabric: Every Source, Every Permission, and How to Access Data Without Copying It Read More »

Artificial Intelligence and Machine Learning for Data Engineers: What It Actually Is, How Companies Use It, and the Complete Introduction Before You Touch an Algorithm

Leave a Comment / Data Engineering, SQL

The complete AI and ML introduction for data engineers — not hype, reality. AI vs ML vs DL vs GenAI hierarchy, supervised vs unsupervised vs reinforcement learning, classification vs regression with decision framework, every traditional ML algorithm and deep learning algorithm with analogies, real-world ML use cases across 6 industries, the ML project lifecycle, where data engineers fit, feature engineering as the bridge, and the complete learning path forward.

Artificial Intelligence and Machine Learning for Data Engineers: What It Actually Is, How Companies Use It, and the Complete Introduction Before You Touch an Algorithm Read More »