SettleSentry: Payment Collection AI Agent

SettleSentry is a conversational payment collection agent for services where customers may have an outstanding amount due, such as cloud bills, mobile plans, subscriptions, or other recurring service balances. It verifies the customer first, shows the amount due only after verification, and guides payment collection through a controlled, policy-governed workflow.

Note

SettleSentry guides payment collection to closure in under 9 user turns, with 1 min 14 sec automated completion time on average with full policy compliance, 0 PII leaks and no premature payment calls.

The core design principle is separation of conversation intelligence from payment authority:

Deterministic workflow and policy gates control verification, balance disclosure, payment confirmation, and payment execution.
LLMs can be used progressively: for parsing, response phrasing, or autonomous tool orchestration.
Even in autonomous mode, the LLM does not own payment authority; it can only call phase-scoped tools backed by deterministic operations and policy checks.

Why It Matters

Payment collection is a sensitive workflow. The agent must maintain multi-turn context, avoid premature tool calls, handle partial or out-of-order input, enforce identity verification, recover safely from failures, and protect sensitive identity and payment data.

SettleSentry demonstrates how this workflow can be automated without giving uncontrolled authority to an LLM. Language models can help interpret user input and phrase responses, while verification, balance disclosure, payment authorization, and API execution remain controlled by deterministic workflow and policy logic.

Core Capabilities

Multi-turn account verification and payment collection
Strict identity verification before balance disclosure
Policy-gated amount validation, card collection, and payment execution
Explicit confirmation before any payment API call
Recovery flows for verification, amount, card, API, cancellation, and terminal failure cases
Progressive LLM integration: parser, responder, and autonomous tool-calling modes with deterministic fallback boundaries
Four-mode ablation design: deterministic workflow, LLM parser, LLM parser/responder, and LLM autonomous tool orchestration
LLM-led autonomous mode over phase-scoped account, identity, amount, card, confirmation, lifecycle, and safety tools
Safety audit and deterministic fallback for autonomous LLM responses
Scenario filtering and exhaustive all-mode evaluation support
Scenario evaluator covering success, recovery, guardrail, correction, and closure paths
Evaluation-compatible interface

Architecture Overview

flowchart TD
    U[User Message] --> I[Agent Interface]
    I --> G[LangGraph Orchestration]
    G --> P[Input Understanding / Tool Layer]
    P --> S[Conversation State]
    S --> R{Routing + Policy Gates}

    R -->|Needs More Information| Q[Ask Next Required Field]
    R -->|Account Lookup Allowed| L[Lookup Account API]
    R -->|Verification Ready| V[In-Agent Identity Verification]
    R -->|Amount and Card Details Ready| C[Prepare Payment Confirmation]
    C --> K{Explicit User Confirmation?}
    K -->|Yes + Policy Allowed| X[Process Payment API]
    K -->|No / Cancel| Z[Close Safely]
    R -->|Terminal or Unsafe to Continue| Z

    Q --> M[Safe Response Context]
    L --> M
    V --> M
    C --> M
    X --> M
    Z --> M

    M --> W[Agent Response / Safety Fallback]
    W --> A[User-Facing Message]

Each user message is processed as one controlled workflow turn. The agent preserves structured state and recent context for short replies, corrections, retries, and out-of-order inputs, while deterministic policy gates control account lookup, verification, balance disclosure, confirmation, and payment execution across all modes.

For the full architecture, policy model, assumptions, and tradeoffs, see the Design Document.

Safety Model

SettleSentry keeps payment authority outside the LLM:

Balance is shown only after successful identity verification.
Payment amount is validated before card collection.
Payment processing requires valid payment details and explicit confirmation.
All payment-critical transitions pass deterministic policy checks.
Full card number and CVV are cleared after success, terminal failure, cancellation, or closure.
Out-of-order user input may be remembered, but policy gates still control sensitive actions.

For detailed safety rules and workflow decisions, see Design Document.

Modes

The CLI supports four modes:

Mode	Input Understanding	Response Writing	Tool / Workflow Control	Use Case
`deterministic-workflow`	Deterministic parser	Deterministic responses	LangGraph workflow routing	Stable no-LLM baseline
`llm-parser-workflow`	LLM parser with deterministic fallback	Deterministic responses	LangGraph workflow routing	Flexible extraction with fixed response behavior
`llm-parser-responder-workflow`	LLM parser with deterministic fallback	LLM responder with deterministic fallback	LangGraph workflow routing	Natural extraction and response phrasing
`llm-autonomous-agent`	LLM interprets the turn	LLM-written response with safety audit/fallback	LLM tool selection over phase-scoped tools	Autonomous agent ablation mode

The default CLI mode is llm-autonomous-agent. Use deterministic-workflow when no OpenRouter API key is configured.

In every mode, payment authority remains deterministic and policy-controlled. The LLM does not verify identity, authorize balance disclosure, bypass policy gates, or process payment without explicit confirmation.

Tech Stack

Python 3.12
LangGraph for workflow orchestration
Pydantic and Pydantic Settings for schema/configuration validation
PydanticAI with OpenRouter for optional LLM parser, responder, and autonomous tool-orchestration behavior
HTTPX and Tenacity for API communication and retry handling
Typer and Rich for interactive CLI
Pytest for unit and workflow test coverage
uv for environment and execution management

Setup

From the repository root:

uv sync --all-packages

Configuration

LLM configuration is optional and required for llm-parser-workflow, llm-parser-responder-workflow, and llm-autonomous-agent.

Start by copying the template and updating values for your environment:

# macOS/Linux
cp .env.example .env

# PowerShell
Copy-Item .env.example .env

Full template: .env.example

Runtime Configuration

Variable	Required	Default	Description
`OPENROUTER_API_KEY`	LLM modes only	unset	OpenRouter API key for LLM-enabled modes.
`OPENROUTER_ENABLED`	No	`true`	Enables OpenRouter-backed parser/responder/autonomous runtime.
`OPENROUTER_BASE_URL`	No	`https://openrouter.ai/api/v1`	OpenRouter API base URL.
`OPENROUTER_MODEL`	No	`openrouter/free`	OpenRouter model identifier.
`OPENROUTER_TIMEOUT_SECONDS`	No	`10`	LLM request timeout in seconds.
`OPENROUTER_TEMPERATURE`	No	`0.0`	LLM temperature for response variability.
`OPENROUTER_MAX_TOKENS`	No	`300`	Max tokens for LLM outputs.
`OPENROUTER_RETRIES`	No	`1`	Retry count for LLM calls.
`API_BASE_URL`	No	`https://example-payment-verification-api.local`	External payment/lookup API base URL.
`API_TIMEOUT_SECONDS`	No	`30`	API timeout in seconds.
`API_MAX_RETRIES`	No	`2`	Retry count for API calls.
`AGENT_POLICY_VERIFICATION_MAX_ATTEMPTS`	No	`3`	Max identity verification attempts before closure.
`AGENT_POLICY_PAYMENT_MAX_ATTEMPTS`	No	`3`	Max payment attempts before closure.
`AGENT_POLICY_ALLOW_PARTIAL_PAYMENTS`	No	`true`	Allows partial payment amounts.
`AGENT_POLICY_ALLOW_ZERO_BALANCE_PAYMENT`	No	`false`	Allows payment flow for zero-balance accounts.
`AGENT_POLICY_MAX_PAYMENT_AMOUNT`	No	unset	Optional hard cap across payment amounts.

Logging Configuration

Variable	Required	Default	Description
`LOG_LEVEL`	No	`INFO`	Application log level.
`LOG_FILE_ENABLED`	No	`true`	Enables file logging.
`LOG_CONSOLE_ENABLED`	No	`true`	Enables console logging.
`LOG_FILE_NAME`	No	unset	Optional explicit log filename (defaults to `<project_name>.log`).
`LOG_MAX_BYTES`	No	`2048000`	Max log file size before rotation.
`LOG_BACKUP_COUNT`	No	`5`	Number of rotated log files to retain.

Evaluator Configuration (`scripts/evaluate_agent.py`)

Variable	Required	Default	Description
`EVAL_REPORT_RETENTION`	No	`10`	Number of dated evaluation reports to keep.
`EVAL_LOCAL_REPEATS_DEFAULT`	No	`1`	Default repeat count for deterministic mode runs.
`EVAL_LLM_REPEATS_DEFAULT`	No	`1`	Default repeat count for LLM mode runs.
`EVAL_SCENARIO_RETRIES_DEFAULT`	No	`1`	Default per-scenario retry count in evaluator.
`EVAL_REPORT_WIDTH`	No	`160`	Console/report rendering width for evaluator output.

Run the Agent

# Deterministic workflow
uv run settlesentry chat --mode deterministic-workflow

# LLM parser with deterministic responses
uv run settlesentry chat --mode llm-parser-workflow

# LLM parser and LLM-written responses
uv run settlesentry chat --mode llm-parser-responder-workflow

# LLM autonomous tool-calling agent
uv run settlesentry chat --mode llm-autonomous-agent

# Show privacy-safe state after each turn
uv run settlesentry chat --mode llm-autonomous-agent --show-state

# Enable console debug logs
uv run settlesentry chat --mode llm-autonomous-agent --debug-logs

If no OpenRouter API key is configured, use deterministic-workflow mode.

Run with Docker Compose

SettleSentry can also be run from the published GitHub Container Registry image.

Pull the latest image:

docker compose pull

Run deterministic workflow mode, which does not require an LLM API key:

docker compose run --rm settlesentry

Run autonomous LLM tool-calling mode with OpenRouter configured in .env:

docker compose --profile llm run --rm settlesentry-autonomous

Example .env for LLM modes:

OPENROUTER_API_KEY=...
OPENROUTER_ENABLED=true

The Compose setup uses the public GHCR image by default:

ghcr.io/kayvanshah1/settlesentry-payment-collection-agent:latest

To build locally instead of pulling the published image:

docker compose -f compose.yaml -f compose.build.yaml run --rm settlesentry

Run the local build in autonomous mode:

docker compose -f compose.yaml -f compose.build.yaml --profile llm run --rm settlesentry-autonomous

Run Tests and Evaluation

Run the core test suite:

uv run pytest -q

Run mode-specific evaluation:

# Deterministic baseline: no LLM dependencies.
uv run python scripts/evaluate_agent.py --no-all --mode deterministic-workflow

# Hybrid mode: LLM parser with deterministic response generation.
uv run python scripts/evaluate_agent.py --no-all --mode llm-parser-workflow

# Hybrid mode: LLM parser and LLM responder (deterministic fallback remains available).
uv run python scripts/evaluate_agent.py --no-all --mode llm-parser-responder-workflow

# Autonomous mode: LLM-led, phase-scoped tool orchestration with safety/fallback controls.
uv run python scripts/evaluate_agent.py --no-all --mode llm-autonomous-agent

# Full exhaustive run: execute the complete scenario matrix across all configured modes.
uv run python scripts/evaluate_agent.py --all --exhaustive

Run a targeted autonomous scenario:

uv run python scripts/evaluate_agent.py --mode llm-autonomous-agent --no-all --exhaustive --scenario verification_exhaustion_closes

For evaluation methodology, metrics, and acceptance criteria, see Evaluation Approach.

Public Interface Contract

SettleSentry exposes a single turn-based interface:

Agent.next(user_input: str) -> {"message": str}

The interface is intentionally simple for evaluation and integration, while the implementation maintains session state, policy checks, graph orchestration, parser context, and external API boundaries internally.

Project Structure

settlesentry/          # Installable package workspace
tests/                 # Unit and workflow tests
scripts/               # Scenario evaluator and helper scripts
docs/                  # Design, evaluation, and sample conversation docs
var/evaluation/        # Generated evaluation reports

For detailed package/module layout, see settlesentry/README.md.

Example Happy Path

USER: Hi
AGENT: Hello, I'm SettleSentry. I help with account verification and payment. Please share your account ID.

USER: ACC1001
AGENT: Account found. Please share your full name exactly as registered on the account.

USER: Nithin Jain
AGENT: Please share one verification factor: DOB in YYYY-MM-DD format, Aadhaar last 4 digits, or pincode.

USER: 1990-05-14
AGENT: Identity verified. Your outstanding balance is INR 1250.75. Please share the amount you would like to pay in INR.

... payment amount, card details, and confirmation collected ...

USER: yes
AGENT: Payment of INR 500.00 was processed successfully. Transaction ID: txn_.... This conversation is now closed.

Full happy-path, failure, retry, side-question, correction, and closure examples are documented in Sample Conversations.

Documentation

Disclaimer

SettleSentry is a technical implementation and reference architecture for a payment collection agent. It is not intended for production payment processing as-is.

A production deployment would require additional security review, PCI-DSS controls, secrets management, persistent session storage, monitoring, audit logging, human escalation, fraud controls, and compliance validation.

Caution

Do not use real payment card data with this project. Use only sample or test payment data.

License

This project is licensed under the BSD 3-Clause License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 190 Commits
.github/workflows		.github/workflows
demos		demos
docs		docs
scripts		scripts
settlesentry		settlesentry
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compose.build.yaml		compose.build.yaml
compose.yaml		compose.yaml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SettleSentry: Payment Collection AI Agent

Why It Matters

Core Capabilities

Architecture Overview

Safety Model

Modes

Tech Stack

Setup

Configuration

Runtime Configuration

Logging Configuration

Evaluator Configuration (`scripts/evaluate_agent.py`)

Run the Agent

Run with Docker Compose

Run Tests and Evaluation

Public Interface Contract

Project Structure

Example Happy Path

Documentation

Disclaimer

License

About

Uh oh!

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SettleSentry: Payment Collection AI Agent

Why It Matters

Core Capabilities

Architecture Overview

Safety Model

Modes

Tech Stack

Setup

Configuration

Runtime Configuration

Logging Configuration

Evaluator Configuration (scripts/evaluate_agent.py)

Run the Agent

Run with Docker Compose

Run Tests and Evaluation

Public Interface Contract

Project Structure

Example Happy Path

Documentation

Disclaimer

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Evaluator Configuration (`scripts/evaluate_agent.py`)

Packages