AI Segment Builder
This document covers the AI-powered segment builder that converts natural language queries into segment criteria.
Overview
The AI Segment Builder allows users to create segment rules using natural language instead of manually constructing JSON criteria. Users describe their target audience in plain English, and the AI translates it into valid segment criteria.
User: "Find platinum members who haven't booked in 60 days"
AI Output:
{
"$and": [
{ "membershipTier": "PLATINUM" },
{ "lastBookingAt": { "$lt": "2025-10-17T00:00:00Z" } }
]
}
Architecture
User Query AI Service Output
─────────────────────────────────────────────────────────────────────
"High-value members → GPT-4o-mini with → { "$and": [
at risk of churning" prompt + validation { "lifetimeValue": { "$gte": 100000 } },
{ "churnRiskScore": { "$gte": 70 } }
]}
Components
| Component | Description |
|---|---|
SegmentBuilderService | Main orchestrator service |
SegmentBuilderPromptBuilder | Constructs system/user prompts |
SegmentBuilderValidator | Validates AI output against field registry |
API Endpoints
Build Segment from Natural Language
POST /v1/ai/segments/build
Converts a natural language query into segment criteria.
Request Body
{
"query": "Find platinum and diamond members who are active",
"tenantId": "tenant_123"
}
Response: 200 OK
{
"success": true,
"result": {
"criteria": {
"$and": [
{ "membershipTier": { "$in": ["PLATINUM", "DIAMOND"] } },
{ "status": "ACTIVE" }
]
},
"explanation": "Platinum and diamond tier members with active status",
"fieldsMapped": ["membershipTier", "status"],
"confidence": 0.95,
"ambiguities": []
},
"previewCount": 342,
"usage": {
"promptTokens": 1240,
"completionTokens": 89
}
}
Refine Existing Criteria
POST /v1/ai/segments/refine
Refines existing segment criteria with additional natural language instructions.
Request Body
{
"instruction": "Also require email opt-in",
"currentCriteria": {
"membershipTier": { "$in": ["PLATINUM", "DIAMOND"] }
},
"tenantId": "tenant_123"
}
Response: 200 OK
{
"success": true,
"result": {
"criteria": {
"$and": [
{ "membershipTier": { "$in": ["PLATINUM", "DIAMOND"] } },
{ "emailOptIn": true }
]
},
"explanation": "Platinum and diamond members who opted in to email",
"fieldsMapped": ["membershipTier", "emailOptIn"],
"confidence": 0.92,
"ambiguities": []
},
"previewCount": 298
}
Preview Count
POST /v1/ai/segments/preview
Returns the count of customers matching the criteria.
Request Body
{
"criteria": {
"engagementScore": { "$gte": 70 }
},
"tenantId": "tenant_123"
}
Response: 200 OK
{
"count": 456
}
Get Available Fields
GET /v1/ai/segments/fields
Returns all available fields for segment criteria.
Response: 200 OK
{
"fields": [
{
"name": "engagementScore",
"type": "number",
"operators": ["eq", "neq", "lt", "lte", "gt", "gte"],
"description": "Engagement score (0-100)"
},
{
"name": "membershipTier",
"type": "string",
"operators": ["eq", "neq", "in", "nin"],
"description": "Membership tier (e.g., PLATINUM, GOLD, SILVER)"
}
]
}
Criteria Format
The AI generates criteria using MongoDB-style operators:
Operators
| Operator | Description | Example |
|---|---|---|
$eq | Equals | { "status": { "$eq": "ACTIVE" } } |
$neq | Not equals | { "status": { "$neq": "DELETED" } } |
$gt | Greater than | { "engagementScore": { "$gt": 70 } } |
$gte | Greater than or equal | { "engagementScore": { "$gte": 70 } } |
$lt | Less than | { "handicap": { "$lt": 10 } } |
$lte | Less than or equal | { "churnRiskScore": { "$lte": 30 } } |
$in | In array | { "membershipTier": { "$in": ["GOLD", "PLATINUM"] } } |
$nin | Not in array | { "status": { "$nin": ["MERGED", "DELETED"] } } |
$contains | String contains | { "email": { "$contains": "@company.com" } } |
$startsWith | String starts with | { "firstName": { "$startsWith": "Jo" } } |
$endsWith | String ends with | { "email": { "$endsWith": ".co.za" } } |
Combinators
| Combinator | Description | Example |
|---|---|---|
$and | All conditions must match | { "$and": [...conditions] } |
$or | Any condition must match | { "$or": [...conditions] } |
$not | Negate condition | { "$not": { "status": "DELETED" } } |
Date Placeholders
The AI can use date placeholders that are resolved at query time:
| Placeholder | Description | Example |
|---|---|---|
{{N_DAYS_AGO}} | N days before current date | {{30_DAYS_AGO}} |
{{N_WEEKS_AGO}} | N weeks before current date | {{2_WEEKS_AGO}} |
{{N_MONTHS_AGO}} | N months before current date | {{3_MONTHS_AGO}} |
{{START_OF_YEAR}} | January 1st of current year | 2025-01-01T00:00:00Z |
{{START_OF_MONTH}} | 1st of current month | 2025-12-01T00:00:00Z |
Available Fields
Identity & Contact
| Field | Type | Operators | Description |
|---|---|---|---|
email | string | eq, neq, contains, startsWith, endsWith | Customer email |
phoneNumber | string | eq, neq, startsWith | E.164 format phone |
firstName | string | eq, neq, contains, startsWith | First name |
lastName | string | eq, neq, contains, startsWith | Last name |
country | string | eq, neq, in, nin | ISO 3166-1 alpha-2 code |
city | string | eq, neq, in, contains | City name |
Status & Lifecycle
| Field | Type | Values | Description |
|---|---|---|---|
status | enum | ACTIVE, INACTIVE, MERGED, DELETED | Customer status |
lifecycleStage | enum | SUBSCRIBER, LEAD, OPPORTUNITY, CUSTOMER, ADVOCATE | Lifecycle stage |
tags | array | - | Customer tags |
Membership
| Field | Type | Description |
|---|---|---|
membershipTier | string | Membership tier (PLATINUM, GOLD, etc.) |
membershipStatus | string | Membership status |
memberSince | datetime | Membership start date |
membershipExpiry | datetime | Membership expiry date |
Marketing Consent
| Field | Type | Description |
|---|---|---|
emailOptIn | boolean | Email marketing consent |
smsOptIn | boolean | SMS marketing consent |
pushOptIn | boolean | Push notification consent |
whatsappOptIn | boolean | WhatsApp consent |
Engagement Scores
| Field | Type | Description |
|---|---|---|
engagementScore | number | Engagement score (0-100) |
churnRiskScore | number | Churn risk score (0-100) |
lifetimeValue | number | Lifetime value in cents |
predictedLtv | number | Predicted lifetime value |
Activity Metrics
| Field | Type | Description |
|---|---|---|
lastActivityAt | datetime | Last activity timestamp |
activityCount30d | number | Activities in last 30 days |
activityCount90d | number | Activities in last 90 days |
bookingCount30d | number | Bookings in last 30 days |
bookingCount90d | number | Bookings in last 90 days |
lastBookingAt | datetime | Last booking timestamp |
lastEmailOpenAt | datetime | Last email open |
lastEmailClickAt | datetime | Last email click |
Golf Data
| Field | Type | Description |
|---|---|---|
handicap | number | Golf handicap (0-54) |
handicapProvider | string | Provider (SAGA, USGA) |
homeClubId | string | Home club ID |
homeClubName | string | Home club name |
Example Queries
| Natural Language Query | Generated Criteria |
|---|---|
| "Platinum and diamond members" | { "membershipTier": { "$in": ["PLATINUM", "DIAMOND"] } } |
| "Golfers with handicap under 10" | { "handicap": { "$lt": 10 } } |
| "Active customers who haven't booked in 60 days" | { "$and": [{ "status": "ACTIVE" }, { "lastBookingAt": { "$lt": "{{60_DAYS_AGO}}" } }] } |
| "High engagement, low churn risk" | { "$and": [{ "engagementScore": { "$gte": 70 } }, { "churnRiskScore": { "$lte": 30 } }] } |
| "Email subscribers in South Africa" | { "$and": [{ "emailOptIn": true }, { "country": "ZA" }] } |
| "New members this year" | { "memberSince": { "$gte": "{{START_OF_YEAR}}" } } |
| "Inactive members with high LTV" | { "$and": [{ "status": "INACTIVE" }, { "lifetimeValue": { "$gte": 500000 } }] } |
Error Handling
Error Codes
| Code | Description | Resolution |
|---|---|---|
AMBIGUOUS_QUERY | Query is too vague (confidence < 0.5) | Clarify with suggestions |
INVALID_FIELD | AI returned unknown field | Check available fields |
PARSE_ERROR | AI returned invalid JSON | Retry or rephrase |
AI_ERROR | OpenAI API failure | Check API key, retry |
Ambiguous Query Response
When confidence is low, the response includes suggestions:
{
"success": false,
"error": {
"code": "AMBIGUOUS_QUERY",
"message": "Query is too ambiguous to build segment",
"suggestions": [
"What defines 'high engagement'? Specify a score threshold.",
"Do you want email opt-in customers only?"
]
}
}
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
AI_OPENAI_KEY | OpenAI API key | Required |
AI_MODEL | Model to use | gpt-4o-mini |
AI_MAX_TOKENS | Max response tokens | 2048 |
AI_TEMP | Temperature (0-1) | 0.2 |
Validation
The validator ensures AI output is safe and valid:
- Field validation: All fields must exist in registry
- Operator validation: Operators must be valid for field type
- Enum validation: Enum values must be in allowed list
- Type validation: Values must match field types
- Date normalization: Placeholders resolved to ISO strings
Field Suggestions
For typos, the validator suggests similar fields:
{
"success": false,
"error": {
"code": "INVALID_FIELD",
"message": "Unknown field: engagmentScore",
"suggestions": ["engagementScore"]
}
}