Skip to content

KayvanShah1/settlesentry-payment-collection-agent

Repository files navigation

SettleSentry: Payment Collection AI Agent

Python LangGraph Pydantic Typer HTTPX Pytest uv License

SettleSentry is a conversational payment collection agent for services where customers may have an outstanding amount due, such as cloud bills, mobile plans, subscriptions, or other recurring service balances. It verifies the customer first, shows the amount due only after verification, and guides payment collection through a controlled, policy-governed workflow.

Note

SettleSentry guides payment collection to closure in under 9 user turns, with 1 min 14 sec automated completion time on average with full policy compliance, 0 PII leaks and no premature payment calls.

The core design principle is separation of conversation intelligence from payment authority:

  • Deterministic workflow and policy gates control verification, balance disclosure, payment confirmation, and payment execution.
  • LLMs can be used progressively: for parsing, response phrasing, or autonomous tool orchestration.
  • Even in autonomous mode, the LLM does not own payment authority; it can only call phase-scoped tools backed by deterministic operations and policy checks.

SettleSentry payment illustration

Why It Matters

Payment collection is a sensitive workflow. The agent must maintain multi-turn context, avoid premature tool calls, handle partial or out-of-order input, enforce identity verification, recover safely from failures, and protect sensitive identity and payment data.

SettleSentry demonstrates how this workflow can be automated without giving uncontrolled authority to an LLM. Language models can help interpret user input and phrase responses, while verification, balance disclosure, payment authorization, and API execution remain controlled by deterministic workflow and policy logic.

Core Capabilities

  • Multi-turn account verification and payment collection
  • Strict identity verification before balance disclosure
  • Policy-gated amount validation, card collection, and payment execution
  • Explicit confirmation before any payment API call
  • Recovery flows for verification, amount, card, API, cancellation, and terminal failure cases
  • Progressive LLM integration: parser, responder, and autonomous tool-calling modes with deterministic fallback boundaries
  • Four-mode ablation design: deterministic workflow, LLM parser, LLM parser/responder, and LLM autonomous tool orchestration
  • LLM-led autonomous mode over phase-scoped account, identity, amount, card, confirmation, lifecycle, and safety tools
  • Safety audit and deterministic fallback for autonomous LLM responses
  • Scenario filtering and exhaustive all-mode evaluation support
  • Scenario evaluator covering success, recovery, guardrail, correction, and closure paths
  • Evaluation-compatible interface

Architecture Overview

flowchart TD
    U[User Message] --> I[Agent Interface]
    I --> G[LangGraph Orchestration]
    G --> P[Input Understanding / Tool Layer]
    P --> S[Conversation State]
    S --> R{Routing + Policy Gates}

    R -->|Needs More Information| Q[Ask Next Required Field]
    R -->|Account Lookup Allowed| L[Lookup Account API]
    R -->|Verification Ready| V[In-Agent Identity Verification]
    R -->|Amount and Card Details Ready| C[Prepare Payment Confirmation]
    C --> K{Explicit User Confirmation?}
    K -->|Yes + Policy Allowed| X[Process Payment API]
    K -->|No / Cancel| Z[Close Safely]
    R -->|Terminal or Unsafe to Continue| Z

    Q --> M[Safe Response Context]
    L --> M
    V --> M
    C --> M
    X --> M
    Z --> M

    M --> W[Agent Response / Safety Fallback]
    W --> A[User-Facing Message]
Loading

Each user message is processed as one controlled workflow turn. The agent preserves structured state and recent context for short replies, corrections, retries, and out-of-order inputs, while deterministic policy gates control account lookup, verification, balance disclosure, confirmation, and payment execution across all modes.

For the full architecture, policy model, assumptions, and tradeoffs, see the Design Document.

Safety Model

SettleSentry keeps payment authority outside the LLM:

  • Balance is shown only after successful identity verification.
  • Payment amount is validated before card collection.
  • Payment processing requires valid payment details and explicit confirmation.
  • All payment-critical transitions pass deterministic policy checks.
  • Full card number and CVV are cleared after success, terminal failure, cancellation, or closure.
  • Out-of-order user input may be remembered, but policy gates still control sensitive actions.

For detailed safety rules and workflow decisions, see Design Document.

Modes

The CLI supports four modes:

Mode Input Understanding Response Writing Tool / Workflow Control Use Case
deterministic-workflow Deterministic parser Deterministic responses LangGraph workflow routing Stable no-LLM baseline
llm-parser-workflow LLM parser with deterministic fallback Deterministic responses LangGraph workflow routing Flexible extraction with fixed response behavior
llm-parser-responder-workflow LLM parser with deterministic fallback LLM responder with deterministic fallback LangGraph workflow routing Natural extraction and response phrasing
llm-autonomous-agent LLM interprets the turn LLM-written response with safety audit/fallback LLM tool selection over phase-scoped tools Autonomous agent ablation mode

The default CLI mode is llm-autonomous-agent. Use deterministic-workflow when no OpenRouter API key is configured.

In every mode, payment authority remains deterministic and policy-controlled. The LLM does not verify identity, authorize balance disclosure, bypass policy gates, or process payment without explicit confirmation.

Tech Stack

  • Python 3.12
  • LangGraph for workflow orchestration
  • Pydantic and Pydantic Settings for schema/configuration validation
  • PydanticAI with OpenRouter for optional LLM parser, responder, and autonomous tool-orchestration behavior
  • HTTPX and Tenacity for API communication and retry handling
  • Typer and Rich for interactive CLI
  • Pytest for unit and workflow test coverage
  • uv for environment and execution management

Setup

From the repository root:

uv sync --all-packages

Configuration

LLM configuration is optional and required for llm-parser-workflow, llm-parser-responder-workflow, and llm-autonomous-agent.

Start by copying the template and updating values for your environment:

# macOS/Linux
cp .env.example .env

# PowerShell
Copy-Item .env.example .env

Full template: .env.example

Runtime Configuration

Variable Required Default Description
OPENROUTER_API_KEY LLM modes only unset OpenRouter API key for LLM-enabled modes.
OPENROUTER_ENABLED No true Enables OpenRouter-backed parser/responder/autonomous runtime.
OPENROUTER_BASE_URL No https://openrouter.ai/api/v1 OpenRouter API base URL.
OPENROUTER_MODEL No openrouter/free OpenRouter model identifier.
OPENROUTER_TIMEOUT_SECONDS No 10 LLM request timeout in seconds.
OPENROUTER_TEMPERATURE No 0.0 LLM temperature for response variability.
OPENROUTER_MAX_TOKENS No 300 Max tokens for LLM outputs.
OPENROUTER_RETRIES No 1 Retry count for LLM calls.
API_BASE_URL No https://example-payment-verification-api.local External payment/lookup API base URL.
API_TIMEOUT_SECONDS No 30 API timeout in seconds.
API_MAX_RETRIES No 2 Retry count for API calls.
AGENT_POLICY_VERIFICATION_MAX_ATTEMPTS No 3 Max identity verification attempts before closure.
AGENT_POLICY_PAYMENT_MAX_ATTEMPTS No 3 Max payment attempts before closure.
AGENT_POLICY_ALLOW_PARTIAL_PAYMENTS No true Allows partial payment amounts.
AGENT_POLICY_ALLOW_ZERO_BALANCE_PAYMENT No false Allows payment flow for zero-balance accounts.
AGENT_POLICY_MAX_PAYMENT_AMOUNT No unset Optional hard cap across payment amounts.

Logging Configuration

Variable Required Default Description
LOG_LEVEL No INFO Application log level.
LOG_FILE_ENABLED No true Enables file logging.
LOG_CONSOLE_ENABLED No true Enables console logging.
LOG_FILE_NAME No unset Optional explicit log filename (defaults to <project_name>.log).
LOG_MAX_BYTES No 2048000 Max log file size before rotation.
LOG_BACKUP_COUNT No 5 Number of rotated log files to retain.

Evaluator Configuration (scripts/evaluate_agent.py)

Variable Required Default Description
EVAL_REPORT_RETENTION No 10 Number of dated evaluation reports to keep.
EVAL_LOCAL_REPEATS_DEFAULT No 1 Default repeat count for deterministic mode runs.
EVAL_LLM_REPEATS_DEFAULT No 1 Default repeat count for LLM mode runs.
EVAL_SCENARIO_RETRIES_DEFAULT No 1 Default per-scenario retry count in evaluator.
EVAL_REPORT_WIDTH No 160 Console/report rendering width for evaluator output.

Run the Agent

# Deterministic workflow
uv run settlesentry chat --mode deterministic-workflow

# LLM parser with deterministic responses
uv run settlesentry chat --mode llm-parser-workflow

# LLM parser and LLM-written responses
uv run settlesentry chat --mode llm-parser-responder-workflow

# LLM autonomous tool-calling agent
uv run settlesentry chat --mode llm-autonomous-agent

# Show privacy-safe state after each turn
uv run settlesentry chat --mode llm-autonomous-agent --show-state

# Enable console debug logs
uv run settlesentry chat --mode llm-autonomous-agent --debug-logs

If no OpenRouter API key is configured, use deterministic-workflow mode.

Run with Docker Compose

SettleSentry can also be run from the published GitHub Container Registry image.

Pull the latest image:

docker compose pull

Run deterministic workflow mode, which does not require an LLM API key:

docker compose run --rm settlesentry

Run autonomous LLM tool-calling mode with OpenRouter configured in .env:

docker compose --profile llm run --rm settlesentry-autonomous

Example .env for LLM modes:

OPENROUTER_API_KEY=...
OPENROUTER_ENABLED=true

The Compose setup uses the public GHCR image by default:

ghcr.io/kayvanshah1/settlesentry-payment-collection-agent:latest

To build locally instead of pulling the published image:

docker compose -f compose.yaml -f compose.build.yaml run --rm settlesentry

Run the local build in autonomous mode:

docker compose -f compose.yaml -f compose.build.yaml --profile llm run --rm settlesentry-autonomous

Run Tests and Evaluation

Run the core test suite:

uv run pytest -q

Run mode-specific evaluation:

# Deterministic baseline: no LLM dependencies.
uv run python scripts/evaluate_agent.py --no-all --mode deterministic-workflow

# Hybrid mode: LLM parser with deterministic response generation.
uv run python scripts/evaluate_agent.py --no-all --mode llm-parser-workflow

# Hybrid mode: LLM parser and LLM responder (deterministic fallback remains available).
uv run python scripts/evaluate_agent.py --no-all --mode llm-parser-responder-workflow

# Autonomous mode: LLM-led, phase-scoped tool orchestration with safety/fallback controls.
uv run python scripts/evaluate_agent.py --no-all --mode llm-autonomous-agent

# Full exhaustive run: execute the complete scenario matrix across all configured modes.
uv run python scripts/evaluate_agent.py --all --exhaustive

Run a targeted autonomous scenario:

uv run python scripts/evaluate_agent.py --mode llm-autonomous-agent --no-all --exhaustive --scenario verification_exhaustion_closes

For evaluation methodology, metrics, and acceptance criteria, see Evaluation Approach.

Public Interface Contract

SettleSentry exposes a single turn-based interface:

Agent.next(user_input: str) -> {"message": str}

The interface is intentionally simple for evaluation and integration, while the implementation maintains session state, policy checks, graph orchestration, parser context, and external API boundaries internally.

Project Structure

settlesentry/          # Installable package workspace
tests/                 # Unit and workflow tests
scripts/               # Scenario evaluator and helper scripts
docs/                  # Design, evaluation, and sample conversation docs
var/evaluation/        # Generated evaluation reports

For detailed package/module layout, see settlesentry/README.md.

Example Happy Path

USER: Hi
AGENT: Hello, I'm SettleSentry. I help with account verification and payment. Please share your account ID.

USER: ACC1001
AGENT: Account found. Please share your full name exactly as registered on the account.

USER: Nithin Jain
AGENT: Please share one verification factor: DOB in YYYY-MM-DD format, Aadhaar last 4 digits, or pincode.

USER: 1990-05-14
AGENT: Identity verified. Your outstanding balance is INR 1250.75. Please share the amount you would like to pay in INR.

... payment amount, card details, and confirmation collected ...

USER: yes
AGENT: Payment of INR 500.00 was processed successfully. Transaction ID: txn_.... This conversation is now closed.

Full happy-path, failure, retry, side-question, correction, and closure examples are documented in Sample Conversations.

Documentation

Disclaimer

SettleSentry is a technical implementation and reference architecture for a payment collection agent. It is not intended for production payment processing as-is.

A production deployment would require additional security review, PCI-DSS controls, secrets management, persistent session storage, monitoring, audit logging, human escalation, fraud controls, and compliance validation.

Caution

Do not use real payment card data with this project. Use only sample or test payment data.

License

This project is licensed under the BSD 3-Clause License. See LICENSE for details.

About

SettleSentry is a secure AI payment assistant that verifies customers, reviews outstanding balances, and guides payment collection through a policy-governed conversational flow.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors