Azure Databricks for ML Tutorial: Unified Analytics Platform

# Tutorial Lengkap Azure Databricks untuk ML: Platform Analytics Terpadu Azure Databricks menyediakan platform analytics kolaboratif berbasis Apache Spark yang dioptimasi untuk machine learning. Plat...

By Ruby Abdullah · · tutorial
AzureDatabricksSparkMLOpsBig DataMachine Learning

Complete Azure Databricks for ML Tutorial: Unified Analytics Platform

Azure Databricks provides a collaborative Apache Spark-based analytics platform optimized for machine learning. It combines data engineering, data science, and machine learning on a unified platform.

Why Azure Databricks for ML?

Key Benefits:
  • Unified platform: Data engineering and ML in one place
  • Collaborative: Notebooks with real-time collaboration
  • Scalable: Auto-scaling Spark clusters
  • MLflow integration: Built-in experiment tracking
  • Delta Lake: Reliable data lake storage

Key Components:
  • Databricks Workspace
  • Spark Clusters
  • Notebooks
  • MLflow
  • Feature Store
  • Model Serving

Prerequisites

pip install databricks-sdk mlflow

Azure CLI

az login

Setup

1. Create Databricks Workspace

from azure.mgmt.databricks import AzureDatabricksManagementClient

from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()

client = AzureDatabricksManagementClient(

credential=credential,

subscriptionid="your-subscription-id"

)

Create workspace

workspace = client.workspaces.begincreateorupdate(

resourcegroupname="my-resource-group",

workspacename="my-databricks-workspace",

parameters={

"location": "eastus",

"sku": {"name": "premium"}

}

).result()

print(f"Workspace created: {workspace.name}")

2. Connect with Databricks SDK

from databricks.sdk import WorkspaceClient

Initialize client

w = WorkspaceClient(

host="https://adb-xxxxx.azuredatabricks.net",

token="dapi-xxxxx"

)

List clusters

clusters = w.clusters.list()

for cluster in clusters:

print(f"{cluster.clustername}: {cluster.state}")

Cluster Management

1. Create ML Cluster

from databricks.sdk.service.compute import (

ClusterSpec,

AutoScale,

AzureAttributes

)

Create cluster

cluster = w.clusters.create(

clustername="ml-cluster",

sparkversion="13.3.x-ml-scala2.12",

nodetypeid="StandardDS3v2",

autoscale=AutoScale(minworkers=1, maxworkers=8),

azureattributes=AzureAttributes(

availability="ONDEMANDAZURE",

firstondemand=1

),

sparkconf={

"spark.databricks.delta.preview.enabled": "true"

},

customtags={

"project": "ml-training",

"team": "data-science"

}

).result()

print(f"Cluster ID: {cluster.clusterid}")

2. GPU Cluster for Deep Learning

gpucluster = w.clusters.create(

clustername="gpu-ml-cluster",

sparkversion="13.3.x-gpu-ml-scala2.12",

nodetypeid="StandardNC6sv3",

numworkers=2,

sparkconf={

"spark.task.resource.gpu.amount": "1"

}

).result()

Notebooks and Data

1. Create Notebook

# Create notebook

notebook = w.workspace.mkdirs("/Users/user@company.com/ml-projects")

Import notebook

w.workspace.import(

path="/Users/user@company.com/ml-projects/training",

format="SOURCE",

language="PYTHON",

content=base64.b64encode(notebookcontent.encode()).decode()

)

2. Working with Delta Lake

# In Databricks notebook

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

Read data

df = spark.read.format("csv").option("header", "true").load("dbfs:/data/train.csv")

Write to Delta Lake

df.write.format("delta").mode("overwrite").save("/delta/trainingdata")

Read from Delta Lake

deltadf = spark.read.format("delta").load("/delta/trainingdata")

Create Delta table

spark.sql("""

Related Articles

Complete Azure Machine Learning Tutorial: End-to-End ML Platform

Tutorial Lengkap Azure Machine Learning: ML End-to-End di Azure Azure Machine Learning adalah platform berbasis cloud un...

Complete Vertex AI Tutorial: Google Cloud Unified ML Platform

Tutorial Lengkap Vertex AI: Platform ML Terpadu di Google Cloud Vertex AI adalah platform machine learning terpadu Googl...

Azure DevOps for MLOps Tutorial: CI/CD for Machine Learning

Tutorial Lengkap Azure DevOps untuk MLOps: CI/CD untuk Machine Learning Azure DevOps menyediakan kemampuan CI/CD kompreh...

Azure MLflow Integration Tutorial: Experiment Tracking on Azure

Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management Azure Machine Learning menyediakan i...