Skip to content

AI Services

The AI Services section configures AI providers for two distinct purposes:

  • Chat AI (Connect) powers AI features in Thirdlane Connect such as composer rewrite (Improve / tone / length), thread summarization, suggested replies, and translation.
  • Recording AI powers post-call analysis on transcribed call recordings (summary, sentiment, categorization, action items, QA score, entity extraction, compliance).

For Text-to-Speech and voice transcription, see Speech Services (next to AI Services in the menu).

Where AI Services are configured

In Configuration Manager, open AI Services under Communications and Services. The grid lists the AI service entries that have been provisioned. Sysadmins can add, edit, and remove entries; tenant admins can pick from existing entries on their own configuration page.

The form’s Purpose dropdown offers:

  • Recording AI — post-call analysis on call recordings.
  • Chat AI (Connect) — in-app AI features in the chat client.

The Provider dropdown offers:

  • Ollama — self-hosted local AI server. Recommended for privacy-sensitive deployments and for testing.
  • OpenAI — hosted OpenAI API (or any OpenAI-compatible endpoint configured via Base URL).
  • Thirdlane — Thirdlane-hosted AI service (subscription option, license-capped).
  • Custom — bring-your-own integration via a script on the server.

Common fields

  • Name — internal identifier for the entry.
  • Description — short note for sysadmins.

Ollama provider

For self-hosted Ollama (local AI):

  • Server URL — e.g. http://localhost:11434. The host and port where Ollama is reachable from the PBX.
  • Model — pulled live from the Ollama server when you select the provider. Recommended models are flagged based on public benchmarks; see the Recommended Models notes shown below the dropdown.

For Recording AI with Ollama, individual analysis services run automatically after transcription:

  • Summary — generates a concise summary of the call.
  • Sentiment — analyzes overall sentiment with a confidence score.
  • Categorization — classifies the call into a category.
  • Action Items — extracts follow-up tasks with assignees and deadlines.
  • QA Score — rates call quality on a 5-point scale across communication dimensions.
  • Entity Extraction — identifies names, organizations, dates, and other key entities.
  • Compliance — checks for required disclosures and flags potential compliance issues.

For Chat AI (Connect) with Ollama, four feature toggles control which composer features the service powers:

  • Enable Composer Rewrite — Improve, tone (Casual / Professional / Confident / Enthusiastic), and length (Make shorter / Make longer).
  • Enable Thread Summarize — on-demand summary of a chat or channel.
  • Enable Suggested Replies — one-tap reply suggestions from visible context.
  • Enable Translate Draft — translate a draft message into a target language.

Disabling a toggle removes the matching control from the chat composer for tenants that pick this service.

Hardware and model requirements (Ollama)

Composer rewrite is sensitive to model capability — small models (under ~1.5B parameters) tend to treat input as a chat turn to answer rather than a string to transform. Recording AI features (summary, action items, QA score, etc.) tolerate weaker models because they consume larger inputs and have more pattern to lock onto, but the chat composer floor is higher.

TierModelRAM / VRAMThroughputVerdict
Floor (CPU)qwen2.5:1.5b Q4~1.5 GB~120 TPS CPUBorderline. Works for short messages, occasional speaker flips on request-shaped inputs. Pick only for resource-starved hosts.
Recommended (CPU)qwen2.5:3b Q4~3 GB~40 TPS CPUReliable across all 7 rewrite styles. Production default when no GPU is available.
Recommended (CPU, alt)llama3.2:3b Q4~3 GB~35 TPS CPUComparable to qwen2.5:3b with more conservative tone shifts. Pick if Qwen’s output reads too informal.
Recommended (GPU)qwen2.5:7b Q4~5 GB VRAM~50 TPS small GPUQuality step over the 3B tier on longer drafts and multilingual content. Requires any modern GPU (e.g. RTX 3060).
Avoidqwen2.5:0.5b, tinyllama, gemma:2b, phi3.5Sub-1.5B parameter count or known quality issues. The picker shows an “Avoid” warning.

The Connect AI service picker fetches this list live from your Ollama host and annotates each model with its tier, recommendation, or avoid reason. The default model is qwen2.5:3b so a CPU-only deployment that doesn’t explicitly choose a model gets a sensible fallback rather than a 7B model that pages on every request.

Minimum host: 4 GB free RAM and 4+ vCPU for qwen2.5:1.5b-3b. With a GPU, >=6 GB VRAM lets you run qwen2.5:7b Q4 comfortably.

OpenAI provider

For hosted OpenAI or any OpenAI-compatible endpoint:

  • API Key — the OpenAI secret key. Stored as a password and never echoed back.
  • Base URL — defaults to https://api.openai.com/v1. Override to point at Azure OpenAI, a self-hosted gateway, or a local OpenAI-compatible server.
  • Organization — optional OpenAI organization ID.
  • Default Model — default chat completion model (e.g. gpt-4o-mini). Used when the request does not specify a model.

The same four Chat AI feature toggles apply.

Thirdlane provider

For the Thirdlane-hosted AI subscription:

  • Access Key ID and Secret Access Key — credentials issued by Thirdlane.
  • Connect AI Monthly Limit (requests) — per-tenant cap on AI requests per calendar month for tenants assigned to this entry. Set to 0 for unlimited (or to whatever your license allows).

The same four Chat AI feature toggles apply. The Thirdlane provider is preferred when you want a turnkey AI offering without managing your own model infrastructure.

Custom provider

Define a local script that runs at request time. Useful for connecting to internal services or wrapping a custom inference stack. The script lives under a controlled directory on the server, takes structured input on stdin, and returns JSON on stdout.

Service profiles — multiple entries per AI offering

The grid is intentionally a flat list, so sysadmins can create multiple entries with different feature sets. Examples:

Service NameProviderToggles enabledUse case
Acme Connect AI PremiumOpenAIRewrite + Summarize + Suggest + TranslatePremium tenants
Acme Connect AI BasicOllamaRewrite + SummarizeStandard tenants
Acme Recording AI FullOllamaSummary + Sentiment + Categorization + Action Items + QA + Entities + ComplianceRegulated industries
Acme Recording AI LightOllamaSummary + SentimentStandard organizations

All entries can share the same underlying server and model; they differ only in which feature toggles are enabled. To change what a tenant receives, simply switch their service assignment.

Tenant assignment

In Tenants -> Edit -> Connect AI Service (sysadmins) or Tenant -> Edit -> Connect AI Service (tenant admin), pick the AI Services entry that should serve this tenant’s users. Tenants follow the global default unless explicitly overridden.

The Connect AI Monthly Limit field on the tenant form caps usage for tenants on Thirdlane-hosted services (overrides the service-level default if set).

Per-user override

User Extensions have a Connect AI Access field with three states:

  • Inherit from tenant — follow the tenant’s master switch.
  • Allow — force AI on for this user even if the tenant has it disabled.
  • Deny — hide all Connect AI features for this user even when the tenant has it enabled.

This lets organizations pilot the feature with a small group, or exclude specific roles from AI features.

Usage limits and reporting

Each tenant on a Thirdlane-hosted service has a Connect AI Monthly Limit that caps the total number of AI requests per calendar month. Each individual feature invocation counts as one request — a single Improve action is 1 request, a single thread summary is 1 request.

Current usage and the next reset date are visible in the Tenants grid under the Connect AI columns in the advanced view.

For Recording AI, usage is counted per analysis — a single call processed with Summary, Sentiment, and Categorization enabled counts as 3 requests against the tenant’s AI cap.

How composer rewrite works

When a Connect user clicks the Improve / tone / length chip in the chat composer, the front-end calls the PBX’s /facade/connect-ai/rewrite endpoint with the draft text and a style identifier (e.g. professional, shorten). The PBX picks the appropriate AI Services entry for the tenant, builds a structured chat-completion request, and returns the rewritten text.

Per-style minimum input length. Each style has a minimum word count:

StyleMinimum wordsWhy
Improve1Lowest-friction polish; safe even on a single word.
Casual / Professional / Confident / Enthusiastic3Tone shifts on shorter inputs tend to confabulate context.
Make longer5Below this, the model has nothing to expand on without inventing content.
Shorten8Nothing to shorten if the input is already short.

When the draft is below the threshold, the chip is disabled in the UI with a tooltip (“Add a few more words…”), and the server-side enforces the same threshold defensively (returns rewrite_input_too_short). This prevents the most common small-model failure mode — generating plausible-sounding content that has no basis in the user’s actual draft.

Prompt engineering. Each style ships with a system prompt and one neutral, topic-free few-shot example. The prompts include explicit anti-confabulation rules (“NEVER invent topics, recipients, deadlines…”), anti-leakage rules (“DO NOT copy specific topic, words, names from the example…”), and speaker-direction rules (“You are reformulating the user’s message, NOT replying to it…”). User input is wrapped in a REWRITE: task prefix so smaller models do not interpret a request-shaped draft (“can you send me the report?”) as a chat turn to answer.

Why the model floor matters. Models below ~1.5B parameters produce conversational replies instead of rewrites no matter how well the prompt is engineered. The recommended floor (qwen2.5:1.5b) holds direction on most inputs; the production sweet spot (qwen2.5:3b or llama3.2:3b) survives our internal sanity pass on all 7 styles with no speaker flips and no confabulation.

See also