Tutorial Azure Databricks untuk ML: Unified Analytics Platform

# Tutorial Lengkap Azure Databricks untuk ML: Platform Analytics Terpadu Azure Databricks menyediakan platform analytics kolaboratif berbasis Apache Spark yang dioptimasi untuk machine learning. Plat...

By Ruby Abdullah · · tutorial
AzureDatabricksSparkMLOpsBig DataMachine Learning

Tutorial Lengkap Azure Databricks untuk ML: Platform Analytics Terpadu

Azure Databricks menyediakan platform analytics kolaboratif berbasis Apache Spark yang dioptimasi untuk machine learning. Platform ini menggabungkan data engineering, data science, dan machine learning dalam satu platform terpadu.

Mengapa Azure Databricks untuk ML?

Manfaat Utama:
  • Platform terpadu: Data engineering dan ML dalam satu tempat
  • Kolaboratif: Notebooks dengan kolaborasi real-time
  • Scalable: Auto-scaling Spark clusters
  • Integrasi MLflow: Built-in experiment tracking
  • Delta Lake: Storage data lake yang reliable

Komponen Utama:
  • Databricks Workspace
  • Spark Clusters
  • Notebooks
  • MLflow
  • Feature Store
  • Model Serving

Prerequisites

pip install databricks-sdk mlflow

Azure CLI

az login

Setup

1. Buat Databricks Workspace

from azure.mgmt.databricks import AzureDatabricksManagementClient

from azure.identity import DefaultAzureCredential

credential = DefaultAzureCredential()

client = AzureDatabricksManagementClient(

credential=credential,

subscriptionid="your-subscription-id"

)

Buat workspace

workspace = client.workspaces.begincreateorupdate(

resourcegroupname="my-resource-group",

workspacename="my-databricks-workspace",

parameters={

"location": "eastus",

"sku": {"name": "premium"}

}

).result()

print(f"Workspace dibuat: {workspace.name}")

2. Koneksi dengan Databricks SDK

from databricks.sdk import WorkspaceClient

Initialize client

w = WorkspaceClient(

host="https://adb-xxxxx.azuredatabricks.net",

token="dapi-xxxxx"

)

List clusters

clusters = w.clusters.list()

for cluster in clusters:

print(f"{cluster.clustername}: {cluster.state}")

Manajemen Cluster

1. Buat ML Cluster

from databricks.sdk.service.compute import (

ClusterSpec,

AutoScale,

AzureAttributes

)

Buat cluster

cluster = w.clusters.create(

clustername="ml-cluster",

sparkversion="13.3.x-ml-scala2.12",

nodetypeid="StandardDS3v2",

autoscale=AutoScale(minworkers=1, maxworkers=8),

azureattributes=AzureAttributes(

availability="ONDEMANDAZURE",

firstondemand=1

),

sparkconf={

"spark.databricks.delta.preview.enabled": "true"

},

customtags={

"project": "ml-training",

"team": "data-science"

}

).result()

print(f"Cluster ID: {cluster.clusterid}")

2. GPU Cluster untuk Deep Learning

gpucluster = w.clusters.create(

clustername="gpu-ml-cluster",

sparkversion="13.3.x-gpu-ml-scala2.12",

nodetypeid="StandardNC6sv3",

numworkers=2,

sparkconf={

"spark.task.resource.gpu.amount": "1"

}

).result()

Notebooks dan Data

1. Buat Notebook

# Buat notebook

notebook = w.workspace.mkdirs("/Users/user@company.com/ml-projects")

Import notebook

w.workspace.import(

path="/Users/user@company.com/ml-projects/training",

format="SOURCE",

language="PYTHON",

content=base64.b64encode(notebookcontent.encode()).decode()

)

2. Bekerja dengan Delta Lake

# Di Databricks notebook

from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

Baca data

df = spark.read.format("csv").option("header", "true").load("dbfs:/data/train.csv")

Tulis ke Delta Lake

df.write.format("delta").mode("overwrite").save("/delta/trainingdata")

Baca dari Delta Lake

deltadf = spark.read.format("delta").load("/delta/trainingdata")

Buat Delta table

spark.sql("""

Artikel Terkait

Tutorial Lengkap Azure Machine Learning: End-to-End ML Platform

Tutorial Lengkap Azure Machine Learning: ML End-to-End di Azure Azure Machine Learning adalah platform berbasis cloud un...

Tutorial Lengkap Vertex AI: Platform ML Terpadu Google Cloud

Tutorial Lengkap Vertex AI: Platform ML Terpadu di Google Cloud Vertex AI adalah platform machine learning terpadu Googl...

Tutorial Azure DevOps untuk MLOps: CI/CD untuk Machine Learning

Tutorial Lengkap Azure DevOps untuk MLOps: CI/CD untuk Machine Learning Azure DevOps menyediakan kemampuan CI/CD kompreh...

Tutorial Integrasi Azure MLflow: Experiment Tracking di Azure

Tutorial Lengkap Azure MLflow Integration: Experiment Tracking dan Model Management Azure Machine Learning menyediakan i...