AI Skill

Official

Pass

Audit score 90

airunway-aks-setup

microsoft/azure-skills

Set up AI Runway on AKS from bare cluster to running model in six steps.

What is airunway-aks-setup?

This skill guides you through installing AI Runway on an existing AKS cluster, including controller setup, GPU assessment, inference provider configuration, and first model deployment. Use it when you need end-to-end AI Runway onboarding or to resume a partially-complete setup.

Verify cluster connectivity, node inventory, and GPU detection
Install AI Runway controller and custom resource definitions (CRDs)
Assess GPU hardware compatibility and flag dtype/attention constraints
Recommend and install an inference provider (KAITO, Dynamo, or KubeRay)
Deploy and verify your first AI model on AKS
Report cluster health status at each step

How to install airunway-aks-setup

npx skills add https://github.com/microsoft/azure-skills --skill airunway-aks-setup

Prerequisites

An existing AKS cluster (provision via azure-kubernetes skill if needed)
kubectl CLI configured with cluster context
make and curl CLI tools available
Understanding of GPU costs (A100-80GB: $3–5+/hr)

Claude Code

Cursor

Windsurf

Cline

How to use airunway-aks-setup

1.Run step 1 to verify cluster connectivity, node inventory, and GPU detection
2.Run step 2 to install the AI Runway controller and CRDs
3.Run step 3 to assess GPU hardware compatibility and constraints
4.Run step 4 to choose and install an inference provider
5.Run step 5 to deploy and verify your first model
6.Run step 6 to review summary and next steps
7.Use skip-to-step N argument to resume from a specific phase if setup was interrupted

Use cases

Good for

Setting up AI Runway on an existing AKS cluster from scratch
Installing the AI Runway controller and CRDs for model serving
Assessing GPU hardware compatibility before deploying inference workloads
Choosing and installing an inference provider for your cluster
Deploying a first AI model to AKS via AI Runway

Who it's for

Platform engineers setting up AKS for AI workloads
DevOps teams onboarding model serving infrastructure
ML engineers deploying inference on Kubernetes
Azure administrators managing GPU clusters

airunway-aks-setup FAQ

Do I need a GPU node pool?

No, but GPU is required for efficient model inference. CPU-only inference is acceptable for testing. GPU node pools incur significant compute charges.

What if my cluster doesn't exist yet?

Use the azure-kubernetes skill first to provision an AKS cluster (optionally with a GPU node pool), then return to this skill.

Can I skip steps or resume from the middle?

Yes. Provide skip-to-step N to start at a specific phase; prior steps are assumed complete.

What inference providers are supported?

Step 4 recommends and installs from KAITO, Dynamo, or KubeRay based on your cluster configuration.

What should I do if the controller is in CrashLoopBackOff?

Check controller logs with kubectl logs -n airunway-system -l control-plane=controller-manager --previous to diagnose config or RBAC issues.

Full instructions (SKILL.md)

Source of truth, from microsoft/azure-skills.

name: airunway-aks-setup description: "Set up AI Runway on AKS — from bare cluster to running model. Covers cluster verification, controller install, GPU assessment, provider setup, and first deployment. WHEN: "setup AI Runway", "onboard AKS cluster", "install AI Runway", "airunway setup", "deploy model to AKS", "GPU inference on AKS", "KAITO setup on AKS", "run LLM on AKS", "vLLM on AKS", "set up model serving on AKS", "AI Runway controller"." license: MIT metadata: author: Microsoft version: "1.0.1" argument-hint: "[skip-to-step N]"

AI Runway AKS Setup

This skill walks users from a bare Kubernetes cluster to a running AI model deployment. Follow each step in sequence unless the user provides skip-to-step N to resume from a specific phase.

Cost awareness: GPU node pools incur significant compute charges (A100-80GB can cost $3–5+/hr). Confirm the user understands cost implications before provisioning GPU resources.

Prerequisites

This skill assumes an AKS cluster already exists. If the user does not have a cluster, hand off to the azure-kubernetes skill first to provision one (with a GPU node pool unless CPU-only inference is acceptable), then return here.

Quick Reference

Property	Value
Best for	End-to-end AI Runway onboarding on AKS
CLI tools	`kubectl`, `make`, `curl`
MCP tools	None
Related skills	`azure-kubernetes` (cluster setup), `azure-diagnostics` (troubleshooting)

When to Use This Skill

Use this skill when the user wants to:

Set up AI Runway on an existing AKS cluster from scratch
Install the AI Runway controller and CRDs
Assess GPU hardware compatibility for model deployment
Choose and install an inference provider (KAITO, Dynamo, KubeRay)
Deploy their first AI model to AKS via AI Runway
Resume a partially-complete AI Runway setup from a specific step

MCP Tools

This skill uses no MCP tools. All cluster operations are performed directly via kubectl and make.

Rules

Execute steps in sequence — load the reference for each step as you reach it
Report cluster state at each step: ✓ healthy, ✗ missing/failed
Ask for user confirmation before any install or deployment action
If a step is already complete, report status and skip to the next step
If the user provides skip-to-step N, start at step N; assume prior steps are complete

Steps

#	Step	Reference
1	Cluster Verification — context check, node inventory, GPU detection	step-1-verify.md
2	Controller Installation — CRD + controller deployment	step-2-controller.md
3	GPU Assessment — detect GPU models, flag dtype/attention constraints	step-3-gpu.md
4	Provider Setup — recommend and install inference provider	step-4-provider.md
5	First Deployment — pick a model, deploy, verify Ready	step-5-deploy.md
6	Summary — recap, smoke test, next steps	step-6-summary.md

Error Handling

Error / Symptom	Likely Cause	Remediation
No kubeconfig context	Not connected to a cluster	Run `az aks get-credentials` or equivalent
Controller in CrashLoopBackOff	Config or RBAC issue	`kubectl logs -n airunway-system -l control-plane=controller-manager --previous`
Provider not ready	Image pull or RBAC issue	`kubectl logs <pod-name> -n <namespace>` for the provider pod
ModelDeployment stuck in Pending	GPU scheduling failure or provider not ready	`kubectl describe modeldeployment <name> -n <namespace>` events
`bfloat16` errors at inference	T4 or V100 lacks bfloat16 support	Add `--dtype float16` to serving args

microsoft/azure-skills

Azure Storage skill: Blob, File Shares, Queue, Table, and Data Lake with access tier guidance and lifecycle management

420k installsAudited