Tutorial Lengkap BigQuery ML: Machine Learning dengan SQL

# Tutorial Lengkap BigQuery ML: Machine Learning dengan SQL BigQuery ML memungkinkan Anda membuat dan mengeksekusi model machine learning menggunakan query SQL. Tidak perlu memindahkan data - trainin...

By Ruby Abdullah · · tutorial
GCPBigQueryBigQuery MLSQLMachine LearningAnalytics

Tutorial Lengkap BigQuery ML: Machine Learning dengan SQL

BigQuery ML memungkinkan Anda membuat dan mengeksekusi model machine learning menggunakan query SQL. Tidak perlu memindahkan data - training dan prediksi langsung di data warehouse Anda.

Mengapa BigQuery ML?

Manfaat Utama:
  • Berbasis SQL: Gunakan sintaks SQL yang familiar
  • Tanpa pemindahan data: Training pada data di tempat
  • Scalable: Manfaatkan infrastruktur BigQuery
  • Terintegrasi: Bekerja dengan ekosistem BigQuery
  • Multiple models: Classification, regression, clustering

Model yang Didukung:
  • Linear/Logistic Regression
  • XGBoost, Random Forest
  • Deep Neural Networks
  • K-means Clustering
  • Time Series (ARIMA)
  • Matrix Factorization

Prerequisites

pip install google-cloud-bigquery

gcloud auth login

Quick Start

1. Buat Classification Model

-- Buat logistic regression model

CREATE OR REPLACE MODEL project.dataset.churnmodel

OPTIONS(

modeltype='LOGISTICREG',

inputlabelcols=['churn'],

autoclassweights=TRUE

) AS

SELECT

age,

tenuremonths,

monthlycharges,

totalcharges,

contracttype,

churn

FROM project.dataset.customers

WHERE PARTITIONTIME BETWEEN '2024-01-01' AND '2024-06-01';

2. Evaluasi Model

-- Evaluasi performa model

SELECT

FROM

ML.EVALUATE(MODEL project.dataset.churnmodel,

(SELECT FROM project.dataset.customers

WHERE PARTITIONTIME > '2024-06-01'));

3. Buat Prediksi

-- Prediksi probabilitas churn

SELECT

customerid,

predictedchurn,

predictedchurnprobs

FROM

ML.PREDICT(MODEL project.dataset.churnmodel,

(SELECT FROM project.dataset.newcustomers));

Tipe Model

1. XGBoost Classifier

CREATE OR REPLACE MODEL project.dataset.xgbclassifier

OPTIONS(

modeltype='BOOSTEDTREECLASSIFIER',

numparalleltree=5,

maxiterations=50,

learnrate=0.1,

inputlabelcols=['target']

) AS

SELECT FROM project.dataset.trainingdata;

2. Random Forest

CREATE OR REPLACE MODEL project.dataset.rfmodel

OPTIONS(

modeltype='RANDOMFORESTCLASSIFIER',

numparalleltree=100,

maxtreedepth=10,

inputlabelcols=['label']

) AS

SELECT FROM project.dataset.trainingdata;

3. Deep Neural Network

CREATE OR REPLACE MODEL project.dataset.dnnmodel

OPTIONS(

modeltype='DNNCLASSIFIER',

hiddenunits=[128, 64, 32],

dropout=0.2,

batchsize=256,

inputlabelcols=['label']

) AS

SELECT FROM project.dataset.trainingdata;

4. Linear Regression

CREATE OR REPLACE MODEL project.dataset.pricepredictor

OPTIONS(

modeltype='LINEARREG',

inputlabelcols=['price'],

l2reg=0.1

) AS

SELECT

bedrooms,

bathrooms,

sqft,

location,

price

FROM project.dataset.houses;

5. K-Means Clustering

CREATE OR REPLACE MODEL project.dataset.customersegments

OPTIONS(

modeltype='KMEANS',

numclusters=5,

standardizefeatures=TRUE

) AS

SELECT

age,

income,

spendingscore,

purchasefrequency

FROM project.dataset.customers;

-- Dapatkan cluster assignments

SELECT

customerid,

CENTROIDID as segment

FROM

ML.PREDICT(MODEL project.dataset.customersegments,

Artikel Terkait

DuckDB: Database Analitik In-Process untuk Data Science

DuckDB: Database Analitik In-Process untuk Data Science DuckDB adalah database analitik in-process yang dirancang khusus...

Tutorial PostgreSQL Advanced untuk ML: Analytics dan Feature Engineering

Tutorial 17: PostgreSQL Lanjutan untuk Machine Learning Daftar Isi Pendahuluan Prasyarat Window Functions untuk Rekayasa...

Tutorial Lengkap Vertex AI: Platform ML Terpadu Google Cloud

Tutorial Lengkap Vertex AI: Platform ML Terpadu di Google Cloud Vertex AI adalah platform machine learning terpadu Googl...

Tutorial SHAP: Explainable AI dan Interpretasi Model

SHAP - Panduan Praktis Explainable AI dan Interpretabilitas Model Model machine learning makin sering dipakai untuk meng...