Whet Documentation

Technical reference.

What is Whet

Whet is a structured career data platform that functions as a Single Source of Truth (SSOT) for professional history. It ingests atomic career data—such as projects, experiences, skills, and education—and dynamically generates derived representations, including career cards, structured DOCX documents, and semantic search indices. The system architecturally decouples data persistence from data presentation, ensuring long-term canonical storage independent of ephemeral output formats.

How It Works

1. Data Ingestion

The system accepts structured input via an Admin Dashboard. Data is normalises and stored in a Firestore NoSQL database.

  • Input: JSON-based structures for distinct entities (e.g., Project, Experience, Skill).
  • Storage: Data is stored in a per-user namespace within the database.
  • Validation: Frontend DataService enforces schema contracts.

2. Transformation Pipeline

Upon modification or manual trigger, the RAG Service transforms canonical data into derived state.

  • Chunking: Text is split into atomic units using structural heuristics rather than fixed token limits.
  • Embedding: Text is converted to vector embeddings using a vector embedding model.
  • Indexing: Vectors are stored in a private index for retrieval operations.

3. Dynamic Output

The system generates read-only views from the canonical or transformed data.

  • Web: Next.js renders Server-Side Rendered (SSR) profiles directly from Firestore data.
  • Documents: python-docx assembles resumes by retrieving relevant chunks.
  • Chat: LLMs generate responses by retrieving semantically relevant chunks (via Cosine Similarity).

Core Concepts

Canonical Data LayerThe persistent store (Firestore) holding the definitive state of a user's career. It is layout-agnostic and acts as the master record.
Derived RepresentationA view or format generated from canonical data (e.g., a PDF, a Vector Index, a UI Card). These are ephemeral and can be regenerated.
Atomic ContentThe storage of data at the lowest useful level (e.g., a single bullet point or skill) to allow flexible reassembly in different contexts.
Hybrid RAGA retrieval strategy that combines dense vector search (semantic) with metadata filtering (keyword/type-based) to locate precise career details.

Key Capabilities

  • Canonical Persistence: Storage of career history in a vendor-neutral, typed JSON schema.
  • Resume Generation: Automated selection and formatting of relevant experience into industry-standard document formats.
  • Semantic Retrieval: Natural language querying capabilities against the career database (e.g., "Find leadership examples from 2021").
  • Multi-View Rendering: Capability to render the same underlying data as a timeline, a grid, or a document.

What Makes It Distinct

  • SSOT Architecture: Updates to a project in the database immediately propagate to search indices, web views, and future document generations.
  • Decoupled Automation: AI/LLM logic is implemented as a transformation service, not a creative author. It formats and retrieves existing data rather than inventing new content.
  • Structural Chunking: Data is indexed by logical boundary (e.g., "Summary" vs "Detail") rather than arbitrary token counts, preserving semantic integrity.
  • Vendor Agnostic: Data is stored as raw text/JSON, ensuring exportability and reducing platform lock-in.