PandasAI: Data Analysis with Natural Language in Python

# PandasAI: Analisis Data dengan Natural Language di Python ## Pendahuluan Bayangkan Anda bisa bertanya kepada data Anda dalam bahasa sehari-hari, tanpa perlu menulis query SQL yang rumit atau mengi...

By Ruby Abdullah · · tutorial
PandasAIData AnalysisNatural LanguagePandasPython

PandasAI: Data Analysis with Natural Language in Python

Introduction

Imagine being able to ask your data questions in plain English, without writing complex SQL queries or remembering intricate Pandas syntax. That is exactly what PandasAI offers: a Python library that integrates Large Language Model (LLM) capabilities with Pandas DataFrames.

PandasAI enables data analysts, data scientists, and even non-technical stakeholders to perform data analysis simply by typing questions in natural language. The library automatically converts your questions into executable Python/Pandas code, runs it, and returns the results.

In this tutorial, we will explore how to use PandasAI comprehensively, from installation to building practical business analytics.

Prerequisites

Before getting started, make sure you have:

  • Python 3.9 or later
  • pip package manager
  • OpenAI API key (or a local LLM like Ollama)
  • Basic understanding of Pandas DataFrames

Installation

Basic Installation

pip install pandasai

Installation with Additional Dependencies

# With plotting support

pip install pandasai[plotting]

With Excel support

pip install pandasai[excel]

Full installation

pip install pandasai[all]

Verify Installation

import pandasai

print(pandasai.version)

Setup and LLM Configuration

Using OpenAI

import os

from pandasai import SmartDataframe

from pandasai.llm.openai import OpenAI

Set API key

os.environ["OPENAIAPIKEY"] = "sk-your-api-key-here"

Or directly during initialization

llm = OpenAI(apitoken="sk-your-api-key-here", model="gpt-4o")

Using Local LLMs with Ollama

If you want to run analysis offline or save on API costs, you can use Ollama:

# Install Ollama first

curl -fsSL https://ollama.ai/install.sh | sh

Download models

ollama pull llama3

ollama pull codellama

from pandasai.llm.localllm import LocalLLM

Configure local LLM via Ollama

llm = LocalLLM(

apibase="http://localhost:11434/v1",

model="llama3"

)

Using Azure OpenAI

from pandasai.llm.azureopenai import AzureOpenAI

llm = AzureOpenAI(

apitoken="your-azure-api-key",

azureendpoint="https://your-resource.openai.azure.com/",

apiversion="2024-02-15-preview",

deploymentname="gpt-4o"

)

Basic Queries with SmartDataframe

Creating a SmartDataframe

import pandas as pd

from pandasai import SmartDataframe

Create sample sales data

data = {

"product": ["Laptop", "Mouse", "Keyboard", "Monitor", "Headset",

"Laptop", "Mouse", "Keyboard", "Monitor", "Headset"],

"category": ["Electronics", "Accessories", "Accessories", "Electronics", "Accessories",

"Electronics", "Accessories", "Accessories", "Electronics", "Accessories"],

"price": [1200, 25, 50, 400, 75,

1300, 20, 45, 450, 80],

"unitssold": [120, 500, 350, 80, 200,

150, 600, 400, 95, 250],

"month": ["January", "January", "January", "January", "January",

"February", "February", "February", "February", "February"]

}

df = pd.DataFrame(data)

sdf = SmartDataframe(df, config={"llm": llm})

Running Natural Language Queries

# Simple query

result = sdf.chat("What is the total sales in January?")

print(result)

Aggregation query

result = sdf.chat("Which product sold the most units?")

print(result)

Filtered query

result = sdf.chat("Show products with price above 100")

print(result)

Statistical query

result = sdf.chat("What is the average price per category?")

print(result)

Complex Queries

# Revenue calculation

result = sdf.chat(

"Calculate total revenue (price x unitssold) for each product, "

Related Articles

Pandera Tutorial: Statistical Data Validation for DataFrames

Pandera: Validasi Data Statistik untuk DataFrame pandas dan Polars Pipeline data sering gagal tanpa suara. Sebuah kolom ...

Reflex Tutorial: Building Full-Stack Web Apps in Pure Python

Reflex: Membangun Aplikasi Web Full-Stack dengan Python Murni Reflex memungkinkan Anda membangun aplikasi web lengkap — ...

ColBERT & RAGatouille Tutorial: Late-Interaction Retrieval for RAG

ColBERT & RAGatouille: Retrieval Late-Interaction untuk RAG yang Lebih Baik Sebagian besar sistem RAG mengandalkan dense...

SGLang Tutorial: Fast LLM Serving and Structured Generation

SGLang: Serving LLM yang Cepat dan Model Pemrograman untuk Generasi Terstruktur SGLang adalah dua hal dalam satu paket: ...