SettleSentry is a conversational payment collection agent for services where customers may have an outstanding amount due, such as cloud bills, mobile plans, subscriptions, or other recurring service balances. It verifies the customer first, shows the amount due only after verification, and guides payment collection through a controlled, policy-governed workflow.
Note
SettleSentry guides payment collection to closure in under 9 user turns, with 1 min 14 sec automated completion time on average with full policy compliance, 0 PII leaks and no premature payment calls.
The core design principle is separation of conversation intelligence from payment authority:
- Deterministic workflow and policy gates control verification, balance disclosure, payment confirmation, and payment execution.
- LLMs can be used progressively: for parsing, response phrasing, or autonomous tool orchestration.
- Even in autonomous mode, the LLM does not own payment authority; it can only call phase-scoped tools backed by deterministic operations and policy checks.
Payment collection is a sensitive workflow. The agent must maintain multi-turn context, avoid premature tool calls, handle partial or out-of-order input, enforce identity verification, recover safely from failures, and protect sensitive identity and payment data.
SettleSentry demonstrates how this workflow can be automated without giving uncontrolled authority to an LLM. Language models can help interpret user input and phrase responses, while verification, balance disclosure, payment authorization, and API execution remain controlled by deterministic workflow and policy logic.
- Multi-turn account verification and payment collection
- Strict identity verification before balance disclosure
- Policy-gated amount validation, card collection, and payment execution
- Explicit confirmation before any payment API call
- Recovery flows for verification, amount, card, API, cancellation, and terminal failure cases
- Progressive LLM integration: parser, responder, and autonomous tool-calling modes with deterministic fallback boundaries
- Four-mode ablation design: deterministic workflow, LLM parser, LLM parser/responder, and LLM autonomous tool orchestration
- LLM-led autonomous mode over phase-scoped account, identity, amount, card, confirmation, lifecycle, and safety tools
- Safety audit and deterministic fallback for autonomous LLM responses
- Scenario filtering and exhaustive all-mode evaluation support
- Scenario evaluator covering success, recovery, guardrail, correction, and closure paths
- Evaluation-compatible interface
flowchart TD
U[User Message] --> I[Agent Interface]
I --> G[LangGraph Orchestration]
G --> P[Input Understanding / Tool Layer]
P --> S[Conversation State]
S --> R{Routing + Policy Gates}
R -->|Needs More Information| Q[Ask Next Required Field]
R -->|Account Lookup Allowed| L[Lookup Account API]
R -->|Verification Ready| V[In-Agent Identity Verification]
R -->|Amount and Card Details Ready| C[Prepare Payment Confirmation]
C --> K{Explicit User Confirmation?}
K -->|Yes + Policy Allowed| X[Process Payment API]
K -->|No / Cancel| Z[Close Safely]
R -->|Terminal or Unsafe to Continue| Z
Q --> M[Safe Response Context]
L --> M
V --> M
C --> M
X --> M
Z --> M
M --> W[Agent Response / Safety Fallback]
W --> A[User-Facing Message]
Each user message is processed as one controlled workflow turn. The agent preserves structured state and recent context for short replies, corrections, retries, and out-of-order inputs, while deterministic policy gates control account lookup, verification, balance disclosure, confirmation, and payment execution across all modes.
For the full architecture, policy model, assumptions, and tradeoffs, see the Design Document.
SettleSentry keeps payment authority outside the LLM:
- Balance is shown only after successful identity verification.
- Payment amount is validated before card collection.
- Payment processing requires valid payment details and explicit confirmation.
- All payment-critical transitions pass deterministic policy checks.
- Full card number and CVV are cleared after success, terminal failure, cancellation, or closure.
- Out-of-order user input may be remembered, but policy gates still control sensitive actions.
For detailed safety rules and workflow decisions, see Design Document.
The CLI supports four modes:
| Mode | Input Understanding | Response Writing | Tool / Workflow Control | Use Case |
|---|---|---|---|---|
deterministic-workflow |
Deterministic parser | Deterministic responses | LangGraph workflow routing | Stable no-LLM baseline |
llm-parser-workflow |
LLM parser with deterministic fallback | Deterministic responses | LangGraph workflow routing | Flexible extraction with fixed response behavior |
llm-parser-responder-workflow |
LLM parser with deterministic fallback | LLM responder with deterministic fallback | LangGraph workflow routing | Natural extraction and response phrasing |
llm-autonomous-agent |
LLM interprets the turn | LLM-written response with safety audit/fallback | LLM tool selection over phase-scoped tools | Autonomous agent ablation mode |
The default CLI mode is llm-autonomous-agent. Use deterministic-workflow when no OpenRouter API key is configured.
In every mode, payment authority remains deterministic and policy-controlled. The LLM does not verify identity, authorize balance disclosure, bypass policy gates, or process payment without explicit confirmation.
- Python 3.12
- LangGraph for workflow orchestration
- Pydantic and Pydantic Settings for schema/configuration validation
- PydanticAI with OpenRouter for optional LLM parser, responder, and autonomous tool-orchestration behavior
- HTTPX and Tenacity for API communication and retry handling
- Typer and Rich for interactive CLI
- Pytest for unit and workflow test coverage
- uv for environment and execution management
From the repository root:
uv sync --all-packagesLLM configuration is optional and required for llm-parser-workflow, llm-parser-responder-workflow, and llm-autonomous-agent.
Start by copying the template and updating values for your environment:
# macOS/Linux
cp .env.example .env
# PowerShell
Copy-Item .env.example .envFull template: .env.example
| Variable | Required | Default | Description |
|---|---|---|---|
OPENROUTER_API_KEY |
LLM modes only | unset | OpenRouter API key for LLM-enabled modes. |
OPENROUTER_ENABLED |
No | true |
Enables OpenRouter-backed parser/responder/autonomous runtime. |
OPENROUTER_BASE_URL |
No | https://openrouter.ai/api/v1 |
OpenRouter API base URL. |
OPENROUTER_MODEL |
No | openrouter/free |
OpenRouter model identifier. |
OPENROUTER_TIMEOUT_SECONDS |
No | 10 |
LLM request timeout in seconds. |
OPENROUTER_TEMPERATURE |
No | 0.0 |
LLM temperature for response variability. |
OPENROUTER_MAX_TOKENS |
No | 300 |
Max tokens for LLM outputs. |
OPENROUTER_RETRIES |
No | 1 |
Retry count for LLM calls. |
API_BASE_URL |
No | https://example-payment-verification-api.local |
External payment/lookup API base URL. |
API_TIMEOUT_SECONDS |
No | 30 |
API timeout in seconds. |
API_MAX_RETRIES |
No | 2 |
Retry count for API calls. |
AGENT_POLICY_VERIFICATION_MAX_ATTEMPTS |
No | 3 |
Max identity verification attempts before closure. |
AGENT_POLICY_PAYMENT_MAX_ATTEMPTS |
No | 3 |
Max payment attempts before closure. |
AGENT_POLICY_ALLOW_PARTIAL_PAYMENTS |
No | true |
Allows partial payment amounts. |
AGENT_POLICY_ALLOW_ZERO_BALANCE_PAYMENT |
No | false |
Allows payment flow for zero-balance accounts. |
AGENT_POLICY_MAX_PAYMENT_AMOUNT |
No | unset | Optional hard cap across payment amounts. |
| Variable | Required | Default | Description |
|---|---|---|---|
LOG_LEVEL |
No | INFO |
Application log level. |
LOG_FILE_ENABLED |
No | true |
Enables file logging. |
LOG_CONSOLE_ENABLED |
No | true |
Enables console logging. |
LOG_FILE_NAME |
No | unset | Optional explicit log filename (defaults to <project_name>.log). |
LOG_MAX_BYTES |
No | 2048000 |
Max log file size before rotation. |
LOG_BACKUP_COUNT |
No | 5 |
Number of rotated log files to retain. |
| Variable | Required | Default | Description |
|---|---|---|---|
EVAL_REPORT_RETENTION |
No | 10 |
Number of dated evaluation reports to keep. |
EVAL_LOCAL_REPEATS_DEFAULT |
No | 1 |
Default repeat count for deterministic mode runs. |
EVAL_LLM_REPEATS_DEFAULT |
No | 1 |
Default repeat count for LLM mode runs. |
EVAL_SCENARIO_RETRIES_DEFAULT |
No | 1 |
Default per-scenario retry count in evaluator. |
EVAL_REPORT_WIDTH |
No | 160 |
Console/report rendering width for evaluator output. |
# Deterministic workflow
uv run settlesentry chat --mode deterministic-workflow
# LLM parser with deterministic responses
uv run settlesentry chat --mode llm-parser-workflow
# LLM parser and LLM-written responses
uv run settlesentry chat --mode llm-parser-responder-workflow
# LLM autonomous tool-calling agent
uv run settlesentry chat --mode llm-autonomous-agent
# Show privacy-safe state after each turn
uv run settlesentry chat --mode llm-autonomous-agent --show-state
# Enable console debug logs
uv run settlesentry chat --mode llm-autonomous-agent --debug-logsIf no OpenRouter API key is configured, use deterministic-workflow mode.
SettleSentry can also be run from the published GitHub Container Registry image.
Pull the latest image:
docker compose pullRun deterministic workflow mode, which does not require an LLM API key:
docker compose run --rm settlesentryRun autonomous LLM tool-calling mode with OpenRouter configured in .env:
docker compose --profile llm run --rm settlesentry-autonomousExample .env for LLM modes:
OPENROUTER_API_KEY=...
OPENROUTER_ENABLED=trueThe Compose setup uses the public GHCR image by default:
ghcr.io/kayvanshah1/settlesentry-payment-collection-agent:latest
To build locally instead of pulling the published image:
docker compose -f compose.yaml -f compose.build.yaml run --rm settlesentryRun the local build in autonomous mode:
docker compose -f compose.yaml -f compose.build.yaml --profile llm run --rm settlesentry-autonomousRun the core test suite:
uv run pytest -qRun mode-specific evaluation:
# Deterministic baseline: no LLM dependencies.
uv run python scripts/evaluate_agent.py --no-all --mode deterministic-workflow
# Hybrid mode: LLM parser with deterministic response generation.
uv run python scripts/evaluate_agent.py --no-all --mode llm-parser-workflow
# Hybrid mode: LLM parser and LLM responder (deterministic fallback remains available).
uv run python scripts/evaluate_agent.py --no-all --mode llm-parser-responder-workflow
# Autonomous mode: LLM-led, phase-scoped tool orchestration with safety/fallback controls.
uv run python scripts/evaluate_agent.py --no-all --mode llm-autonomous-agent
# Full exhaustive run: execute the complete scenario matrix across all configured modes.
uv run python scripts/evaluate_agent.py --all --exhaustive
Run a targeted autonomous scenario:
uv run python scripts/evaluate_agent.py --mode llm-autonomous-agent --no-all --exhaustive --scenario verification_exhaustion_closesFor evaluation methodology, metrics, and acceptance criteria, see Evaluation Approach.
SettleSentry exposes a single turn-based interface:
Agent.next(user_input: str) -> {"message": str}The interface is intentionally simple for evaluation and integration, while the implementation maintains session state, policy checks, graph orchestration, parser context, and external API boundaries internally.
settlesentry/ # Installable package workspace
tests/ # Unit and workflow tests
scripts/ # Scenario evaluator and helper scripts
docs/ # Design, evaluation, and sample conversation docs
var/evaluation/ # Generated evaluation reports
For detailed package/module layout, see settlesentry/README.md.
USER: Hi
AGENT: Hello, I'm SettleSentry. I help with account verification and payment. Please share your account ID.
USER: ACC1001
AGENT: Account found. Please share your full name exactly as registered on the account.
USER: Nithin Jain
AGENT: Please share one verification factor: DOB in YYYY-MM-DD format, Aadhaar last 4 digits, or pincode.
USER: 1990-05-14
AGENT: Identity verified. Your outstanding balance is INR 1250.75. Please share the amount you would like to pay in INR.
... payment amount, card details, and confirmation collected ...
USER: yes
AGENT: Payment of INR 500.00 was processed successfully. Transaction ID: txn_.... This conversation is now closed.
Full happy-path, failure, retry, side-question, correction, and closure examples are documented in Sample Conversations.
- Design Document
- Evaluation Approach
- Autonomous Agent Mode
- Sample Conversations
- Implementation Notes
- Design Rationale
- Package Layout
SettleSentry is a technical implementation and reference architecture for a payment collection agent. It is not intended for production payment processing as-is.
A production deployment would require additional security review, PCI-DSS controls, secrets management, persistent session storage, monitoring, audit logging, human escalation, fraud controls, and compliance validation.
Caution
Do not use real payment card data with this project. Use only sample or test payment data.
This project is licensed under the BSD 3-Clause License. See LICENSE for details.
