PandasAI: Data Analysis with Natural Language in Python
Introduction
Imagine being able to ask your data questions in plain English, without writing complex SQL queries or remembering intricate Pandas syntax. That is exactly what PandasAI offers: a Python library that integrates Large Language Model (LLM) capabilities with Pandas DataFrames.
PandasAI enables data analysts, data scientists, and even non-technical stakeholders to perform data analysis simply by typing questions in natural language. The library automatically converts your questions into executable Python/Pandas code, runs it, and returns the results.
In this tutorial, we will explore how to use PandasAI comprehensively, from installation to building practical business analytics.
Prerequisites
Before getting started, make sure you have:
- Python 3.9 or later
- pip package manager
- OpenAI API key (or a local LLM like Ollama)
- Basic understanding of Pandas DataFrames
Installation
Basic Installation
pip install pandasai
Installation with Additional Dependencies
# With plotting support
pip install pandasai[plotting]
With Excel support
pip install pandasai[excel]
Full installation
pip install pandasai[all]
Verify Installation
import pandasai
print(pandasai.version)
Setup and LLM Configuration
Using OpenAI
import os
from pandasai import SmartDataframe
from pandasai.llm.openai import OpenAI
Set API key
os.environ["OPENAIAPIKEY"] = "sk-your-api-key-here"
Or directly during initialization
llm = OpenAI(apitoken="sk-your-api-key-here", model="gpt-4o")
Using Local LLMs with Ollama
If you want to run analysis offline or save on API costs, you can use Ollama:
# Install Ollama first
curl -fsSL https://ollama.ai/install.sh | sh
Download models
ollama pull llama3
ollama pull codellama
from pandasai.llm.localllm import LocalLLM
Configure local LLM via Ollama
llm = LocalLLM(
apibase="http://localhost:11434/v1",
model="llama3"
)
Using Azure OpenAI
from pandasai.llm.azureopenai import AzureOpenAI
llm = AzureOpenAI(
apitoken="your-azure-api-key",
azureendpoint="https://your-resource.openai.azure.com/",
apiversion="2024-02-15-preview",
deploymentname="gpt-4o"
)
Basic Queries with SmartDataframe
Creating a SmartDataframe
import pandas as pd
from pandasai import SmartDataframe
Create sample sales data
data = {
"product": ["Laptop", "Mouse", "Keyboard", "Monitor", "Headset",
"Laptop", "Mouse", "Keyboard", "Monitor", "Headset"],
"category": ["Electronics", "Accessories", "Accessories", "Electronics", "Accessories",
"Electronics", "Accessories", "Accessories", "Electronics", "Accessories"],
"price": [1200, 25, 50, 400, 75,
1300, 20, 45, 450, 80],
"unitssold": [120, 500, 350, 80, 200,
150, 600, 400, 95, 250],
"month": ["January", "January", "January", "January", "January",
"February", "February", "February", "February", "February"]
}
df = pd.DataFrame(data)
sdf = SmartDataframe(df, config={"llm": llm})
Running Natural Language Queries
# Simple query
result = sdf.chat("What is the total sales in January?")
print(result)
Aggregation query
result = sdf.chat("Which product sold the most units?")
print(result)
Filtered query
result = sdf.chat("Show products with price above 100")
print(result)
Statistical query
result = sdf.chat("What is the average price per category?")
print(result)
Complex Queries
# Revenue calculation
result = sdf.chat(
"Calculate total revenue (price x unitssold) for each product, "