fabric-lakehouse
github/awesome-copilot
Reference knowledge for designing and optimizing Microsoft Fabric Lakehouse solutions.
What is fabric-lakehouse?
A reference skill that provides context, definitions, and best practices about Microsoft Fabric Lakehouse, covering its data components, schema/shortcut organization, security model, performance optimization, and code examples. Use it when designing, building, explaining, or optimizing Fabric Lakehouse solutions.
- Explains core Fabric Lakehouse concepts: Delta Tables, Files, SQL Endpoint, Shortcuts, and Materialized Views
- Describes how to organize tabular and non-tabular data using schemas and folders
- Documents item-level (workspace roles) and data-level (OneLake RBAC, column/row-level) security models
- Covers shortcut types for linking to internal and external data sources (ADLS Gen2, S3, Dataverse, GCS)
- Outlines performance optimization techniques including V-Order, OPTIMIZE, and Vacuum
- Points to reference files for PySpark code examples and data ingestion guidance
How to install fabric-lakehouse
npx skills add https://github.com/github/awesome-copilot --skill fabric-lakehouse- Working knowledge of Microsoft Fabric and its workspace/item model is helpful for applying the concepts described.
How to use fabric-lakehouse
- 1.Invoke the skill when working on a task involving Microsoft Fabric Lakehouse design, explanation, or documentation.
- 2.Reference the core concepts section to explain or decide on Lakehouse components like Delta Tables, Files, SQL Endpoint, Shortcuts, and Materialized Views.
- 3.Use the schema and shortcut guidance to plan how tables and data should be organized within the Lakehouse.
- 4.Apply the security section when designing or explaining access control at the item (workspace roles) and data (OneLake RBAC) levels.
- 5.Use the performance optimization section (V-Order, OPTIMIZE, Vacuum) when advising on Lakehouse maintenance and query performance.
- 6.Consult the linked references/pyspark.md and references/getdata.md files for concrete PySpark code examples and data ingestion guidance.
Use cases
- Explaining what a Fabric Lakehouse is and how its components (Delta Tables, Files, SQL Endpoint, Shortcuts, Materialized Views) work
- Designing schema organization and shortcut strategy for tables in a Lakehouse
- Advising on item-level and data-level (OneLake) security and access control for Lakehouse data
- Recommending performance optimization techniques like V-Order, OPTIMIZE, and Vacuum for Delta tables
- Writing PySpark code or planning data ingestion into a Lakehouse using the included reference docs
- Data engineers working with Microsoft Fabric
- Solution architects designing Lakehouse-based data platforms
- AI agent developers needing accurate context about Fabric Lakehouse for generating documentation or code
- BI developers integrating Lakehouse data with Power BI semantic models
fabric-lakehouse FAQ
No, it is a reference/context skill. It provides documentation and explanations about Fabric Lakehouse concepts, organization, security, and code examples to inform how an agent designs or explains Lakehouse solutions.
Delta is the main table format (with ACID transactions, versioning, and time travel). CSV and Parquet are also supported but only queryable via Spark, not the SQL endpoint.
Item-level access uses workspace roles (Admin, Member, Contributor, Viewer) and sharing. Data-level access uses the OneLake security model based on Microsoft Entra ID and RBAC, including column-level and row-level security.
Shortcuts are virtual links to data without copying it. Types include Internal (other Fabric Lakehouses), ADLS Gen2, Amazon S3, Dataverse, and Google Cloud Storage shortcuts.
Yes, it references separate documents for PySpark code examples and for getting data into a Lakehouse.
Full instructions (SKILL.md)
Source of truth, from github/awesome-copilot.
name: fabric-lakehouse description: 'Use this skill to get context about Fabric Lakehouse and its features for software systems and AI-powered functions. It offers descriptions of Lakehouse data components, organization with schemas and shortcuts, access control, and code examples. This skill supports users in designing, building, and optimizing Lakehouse solutions using best practices.' metadata: author: tedvilutis version: "1.0"
When to Use This Skill
Use this skill when you need to:
- Generate a document or explanation that includes definition and context about Fabric Lakehouse and its capabilities.
- Design, build, and optimize Lakehouse solutions using best practices.
- Understand the core concepts and components of a Lakehouse in Microsoft Fabric.
- Learn how to manage tabular and non-tabular data within a Lakehouse.
Fabric Lakehouse
Core Concepts
What is a Lakehouse?
Lakehouse in Microsoft Fabric is an item that gives users a place to store their tabular data (like tables) and non-tabular data (like files). It combines the flexibility of a data lake with the management capabilities of a data warehouse. It provides:
- Unified storage in OneLake for structured and unstructured data
- Delta Lake format for ACID transactions, versioning, and time travel
- SQL analytics endpoint for T-SQL queries
- Semantic model for Power BI integration
- Support for other table formats like CSV, Parquet
- Support for any file formats
- Tools for table optimization and data management
Key Components
- Delta Tables: Managed tables with ACID compliance and schema enforcement
- Files: Unstructured/semi-structured data in the Files section
- SQL Endpoint: Auto-generated read-only SQL interface for querying
- Shortcuts: Virtual links to external/internal data without copying
- Fabric Materialized Views: Pre-computed tables for fast query performance
Tabular data in a Lakehouse
Tabular data in a form of tables are stored under "Tables" folder. Main format for tables in Lakehouse is Delta. Lakehouse can store tabular data in other formats like CSV or Parquet, these formats are only available for Spark querying. Tables can be internal, when data is stored under "Tables" folder, or external, when only reference to a table is stored under "Tables" folder but the data itself is stored in a referenced location. Tables are referenced through Shortcuts, which can be internal (pointing to another location in Fabric) or external (pointing to data stored outside of Fabric).
Schemas for tables in a Lakehouse
When creating a lakehouse, users can choose to enable schemas. Schemas are used to organize Lakehouse tables. Schemas are implemented as folders under the "Tables" folder and store tables inside of those folders. The default schema is "dbo" and it can't be deleted or renamed. All other schemas are optional and can be created, renamed, or deleted. Users can reference a schema located in another lakehouse using a Schema Shortcut, thereby referencing all tables in the destination schema with a single shortcut.
Files in a Lakehouse
Files are stored under "Files" folder. Users can create folders and subfolders to organize their files. Any file format can be stored in Lakehouse.
Fabric Materialized Views
Set of pre-computed tables that are automatically updated based on a schedule. They provide fast query performance for complex aggregations and joins. Materialized views are defined using PySpark or Spark SQL and stored in an associated Notebook.
Spark Views
Logical tables defined by a SQL query. They do not store data but provide a virtual layer for querying. Views are defined using Spark SQL and stored in Lakehouse next to Tables.
Security
Item access or control plane security
Users can have workspace roles (Admin, Member, Contributor, Viewer) that provide different levels of access to Lakehouse and its contents. Users can also get access permission using sharing capabilities of Lakehouse.
Data access or OneLake Security
For data access use OneLake security model, which is based on Microsoft Entra ID (formerly Azure Active Directory) and role-based access control (RBAC). Lakehouse data is stored in OneLake, so access to data is controlled through OneLake permissions. In addition to object-level permissions, Lakehouse also supports column-level and row-level security for tables, allowing fine-grained control over who can see specific columns or rows in a table.
Lakehouse Shortcuts
Shortcuts create virtual links to data without copying:
Types of Shortcuts
- Internal: Link to other Fabric Lakehouses/tables, cross-workspace data sharing
- ADLS Gen2: Link to ADLS Gen2 containers in Azure
- Amazon S3: AWS S3 buckets, cross-cloud data access
- Dataverse: Microsoft Dataverse, business application data
- Google Cloud Storage: GCS buckets, cross-cloud data access
Performance Optimization
V-Order Optimization
For faster data read with semantic model enable V-Order optimization on Delta tables. This presorts data in a way that improves query performance for common access patterns.
Table Optimization
Tables can also be optimized using the OPTIMIZE command, which compacts small files into larger ones and can also apply Z-ordering to improve query performance on specific columns. Regular optimization helps maintain performance as data is ingested and updated over time. The Vacuum command can be used to clean up old files and free up storage space, especially after updates and deletes.
Lineage
The Lakehouse item supports lineage, which allows users to track the origin and transformations of data. Lineage information is automatically captured for tables and files in Lakehouse, showing how data flows from source to destination. This helps with debugging, auditing, and understanding data dependencies.
PySpark Code Examples
See PySpark code for details.
Getting data into Lakehouse
See Get data for details.
Related skills
More from github/awesome-copilot and the wider catalog.
git-commit
Execute semantic git commits with conventional message analysis and intelligent staging.
excalidraw-diagram-generator
Generate Excalidraw diagrams from natural language descriptions.
documentation-writer
Create structured technical documentation using the Diátaxis framework for tutorials, how-to guides, references, and explanations.
gh-cli
GitHub CLI comprehensive reference for repositories, issues, PRs, Actions, projects, releases, and all GitHub operations from the command line.
prd
Generate comprehensive Product Requirements Documents with executive summaries, user stories, technical specs, and risk analysis.
refactor
Surgical code refactoring to improve maintainability without changing behavior.