VOICE AI LEARNING ASSISTANT

From Static Finance Content to a Searchable AI Learning Platform

The client had high-quality courses, Synthesia video lessons, and a 500+ page financial modeling book. Members still had to search manually. Here is how we built a retrieval and voice AI layer across all three content sources.

Company

Investment Analyst

Content sources

2,800+

Pages ingested

500+

Duration

~6 Weeks

Case study hero

Before

After

Members browsed courses manually with no semantic understanding

Hybrid keyword and semantic search across platform content

Video knowledge stayed locked inside Synthesia transcripts

Transcripts pulled, cleaned, and indexed automatically

A 500+ page book existed only as a static PDF

Book content extracted, chunked, embedded, and stored in Qdrant

No AI assistant existed for member Q&A

ElevenLabs voice agent answers from proprietary training material

Course and video updates required manual operational work

Automated LearnWorlds and Synthesia sync pipelines keep content current

No financial advice controls were needed because there was no AI layer

Explicit guardrails keep the agent educational, not advisory

Discovery:

Understanding the Content Before Designing the AI Layer

We mapped how members were expected to learn today, where the content lived, and what LearnWorlds, Synthesia, Algolia, ElevenLabs, and Qdrant could realistically support.

What we learned:

Three useful content sources existed, but none of them worked together.LearnWorlds held course structure, Synthesia held video scripts, and the 500-page PDF held the deepest reference material.

Course content was structured, but not semantically searchable.Members could browse pages and pathways, but they could not ask questions in natural language and get directed to the right lesson.

Video transcripts were valuable, but isolated. Synthesia contained spoken instructional content, but LearnWorlds had no native transcript layer or reliable mapping between courses and videos.

The PDF needed its own retrieval pipeline.The Financial Modelling Mastery book was too important to treat as a static file. It needed extraction, cleaning, chunking, embeddings, and vector storage before an AI agent could use it.

Discovery session

Two decisions shaped the entire system:

Phase 1 icon
Build a decoupled content architecture

Pull content from each source independently

Avoid manual course-to-video mapping

Use automated sync pipelines instead of an admin curation layer

Reduce maintenance overhead for future course and video updates

Phase 2 icon
Treat the PDF as the foundation for book intelligence

Build a reusable PDF ingestion pipeline

Clean noisy textbook content before embedding

Store semantic chunks in Qdrant

Validate retrieval through real finance-related test queries

This approach let us move fast without forcing LearnWorlds, Synthesia, and the PDF book into one brittle content model. Each source kept its structure. The intelligence layer made them searchable, retrievable, and usable by the AI agent.

The three knowledge layers

LearnWorlds Course Layer

Course titles

Descriptions

Learning pathways

DATA SOURCE

LearnWorlds API

System Role

Synced into Algolia for semantic search and recommendations.

Synthesia Video Layer

Video metadata

Scripts

Clean transcripts

DATA SOURCE

Synthesia API

System Role

Converts video scripts into searchable learning content, so spoken lessons can be discovered through semantic search and surfaced by the AI agent.

Financial Modelling Book Layer

500-page PDF

Cleaned text chunks

Embedded book knowledge

DATA SOURCE

Financial Modelling Mastery PDF

System Role

Turns the 500+ page PDF into a retrievable knowledge base, so the AI agent can answer detailed finance questions from the book instead of relying on generic model knowledge.

WHAT WE BUILT

A unified AI intelligence layer across three content sources.

We built four connected components that turned The Investment Analyst's static learning content into a searchable, AI-assisted member experience.

PDF Ingestion Pipeline

Built a reusable Python pipeline to extract, clean, chunk, embed, & store the 500+ page Financial Modelling Mastery book in Qdrant.

Page-level text extraction
Noise filtering for headers, TOC entries, captions, blank pages, and copyright text
Semantic chunking for retrieval quality
OpenAI embedding generation
Qdrant storage
Test-query validation through the agent
Why it mattered:The book became retrievable by the AI assistant instead of sitting as a static PDF.

Built a member-facing AI tutor using ElevenLabs Agents v2.0 and the client's professional voice clone.

Voice-enabled member Q&A
Answers grounded in proprietary content
References and links back to courses, videos, and book material
Guardrails against personalised financial advice
LearnWorlds widget / iframe embedding
Paywall-compatible access
Why it mattered:Members could ask questions naturally and get guided to the right learning material.

Algolia Semantic Search & Retrieval

Configured Algolia as the discovery layer for LearnWorlds courses and Synthesia video content.

Hybrid keyword + semantic search
Course title, description, and pathway indexing
Video transcript indexing
Investment-domain index structure
Algolia Recommendation API for course suggestions
Smart search inside LearnWorlds
Why it mattered:Courses and video knowledge became searchable from one place.

Automated Content Sync Pipelines

Built automated pipelines to keep LearnWorlds and Synthesia content current inside the search and AI layer.

LearnWorlds course sync
Synthesia video metadata sync
Transcript extraction and timestamp cleanup
Scheduled and event-driven indexing
Update and deletion handling in Algolia
Why it mattered:New, updated, or removed content flowed into the system without manual operational work.

The Details That Made It Production-Ready

01

Chunking Strategy for the PDF

Raw PDF extraction created too much noise for reliable retrieval. The book included tables, formulae, repeated headers, captions, footnotes, and copyright text.

We solved this with:

Multi-stage text cleaning
Removal of non-informational content
Semantic chunks instead of fixed character windows
Finance-specific retrieval tests
Tech stack
PythonOpenAIQdrant
02

Decoupled Architecture Over Manual Linking

LearnWorlds and Synthesia had no native connection.

Instead of building a manual linking layer with a custom DB and admin UI, we used independent sync pipelines feeding a shared Algolia index.

This meant:

No manual course-to-video mapping
No ongoing admin curation
Faster delivery
Easier content updates
Tech stack
LearnWorldsSynthesiaAlgolia
03

Embedding Model Choice

Investment content is dense and vocabulary-heavy. Queries like EBITDA bridge, terminal value growth rate, or DCF assumptions need financial context, not surface similarity.

We used OpenAI embeddings for the PDF pipeline to improve retrieval quality on finance-specific content.

Better embeddings meant better answers from day one.

Tech stack
OpenAIQdrant
04

Keeping the Agent Grounded

The agent had to teach financial concepts, not give investment advice.

We configured it with:

Grounding against TIA's proprietary content
Guardrails against personalised financial advice
Clear fallback behavior when content is not found
References and links back to courses, videos, or book material
Tech stack
ElevenLabsLearnWorlds
05

Platform Embedding Constraints

LearnWorlds limited how deeply the AI layer could be embedded.

We avoided unsupported platform customization and delivered the experience through widget and frame embedding.

This kept the integration:

Stable across LearnWorlds updates
Compatible with member-only access
Easy to place across course pages
Independent from LearnWorlds core code
Tech stack
ElevenLabsLearnWorlds

Real World Challenges

PDF noise would have polluted retrieval

Without filtering repeated headers, captions, blank-page noise, and TOC fragments, the agent would retrieve junk.

Reusability changed the pipeline design

The ingestion module was built configuration-first, so future books would not require a rewrite.

No native LearnWorlds-Synthesia mapping

There was no reliable way to say this course unit equals this Synthesia video. That forced the decoupled architecture decision early.

Paywall embedding had to be validated

The ElevenLabs widget had to work inside authenticated LearnWorlds pages without unsupported platform changes.

What changed for members and platform operations

Area

Before

After

Content search

Manual page browsing, no semantic understanding

Hybrid keyword and semantic search via Algolia NeuralSearch

Cross-source discovery

No way to search LearnWorlds and Synthesia together

Unified index across courses, transcripts, and book content

Book knowledge

500-page PDF unused by any system

Fully embedded in Qdrant - queryable by the AI agent

Member Q&A

No AI assistant on the platform

ElevenLabs voice agent grounded in proprietary content

Video transcripts

Locked on Synthesia, not connected to search

Auto-synced and indexed into Algolia via pipeline

Content freshness

Manual updates required for every content change

Automated pipelines handle sync, updates, and deletions

Course recommendations

Static navigation; no intelligent recommendation

Algolia Recommendation API surfaces courses contextually

Financial advice risk

No AI on platform

Explicit guardrails in place; agent teaches, not advises

The Team and Timeline

A single engineer delivered the core work over approximately six weeks, while the broader platform engagement ran in parallel.

1–2 weeks

Discovery Phase

Architectural analysis
Platform constraint mapping
Approach evaluation
Decision and sign-off
~6 weeks

PDF Pipeline

PDF extraction and cleaning
Semantic chunking
OpenAI embedding
Qdrant ingestion
Test-query validation
README
Parallel

AI Agent

ElevenLabs v2.0 widget integration
Voice clone configuration
Prompt grounding
Guardrails
CTA placement
Paywall embedding
Algolia layer
NeuralSearch configuration
Index design
Recommendation API
LearnWorlds and Synthesia pipeline builds
Automated sync
ONGOING

Go Live

End-to-end validation
Production deployment
Monitoring
AMC support

Tech Stack

AI & VOICE
ElevenLabs Agents
ElevenLabs Agents
Professional voice clone
Prompt grounding
Response orchestration
VECTOR DATABASE
Qdrant
Qdrant
Open AI
Open AI
SEARCH & RETRIEVAL
Algolia Neural Search
Algolia Neural Search
Algolia Vector DB
Algolia API
Cloud Functions
Cloud Run Jobs
CONTENT SOURCES
LearnWorlds APIs
LearnWorlds APIs
Synthesia APIs
Synthesia APIs
INGESTION PIPELINE
Python
Python
Scheduled Workflows
Event-Driven Indexing Workflows
BACKEND SERVICES
Node.js
Node.js
TypeScript
TypeScript
Secure API Integration
Data Transformation Layers
FRONTEND & EMBEDDING
Embedded Widget
iFrame
Paywall-Compatible CTA Placement
SECURITY
Token-Based API
Environment-Based secrets Management
Karan Shah
Newsletter

Brew. Build. Breakthrough.

A twice-a-month newsletter from
Karan Shah, CEO & Co-Founder

10K+ Users Already Subscribed

SoluteLabs © 2014-2026

Privacy & Terms

Karan Shah
Newsletter

Brew. Build. Breakthrough.

A twice-a-month newsletter from
Karan Shah, CEO & Co-Founder

10K+ Users Already Subscribed

SoluteLabs © 2014-2026

Privacy & Terms