A Custom GPT can approximate our polish prompt. It can’t reproduce the persistent voice fingerprint, the multi-detector validation loop, the citation-safe extraction, or the deadline-aware orchestration. This page is the technical companion to Application Bundle and Research Paper. Skip if you don’t care how the sausage is made.
Everything runs on Google Cloud (Vertex AI for models, GCP for infrastructure) plus Supabase for data. Single bill, single region, single trust boundary — which matters when an institutional buyer asks for one DPA.
Google Cloud · Vertex AI
Claude Sonnet 4.6 + Opus 4.7 (polish), Gemini for embeddings + cross-validation, Vertex AI batch prediction (50% off, nightly detector runs), Vertex AI context caching (for the reference corpus), Vertex AI Search (journal style guide RAG). All in europe-west4.
Google Cloud · Infrastructure
Cloud Run (app + PDF export), Cloud Tasks (bundle orchestration), Cloud Scheduler (deadline intelligence), Document AI (citation-safe extraction), BigQuery (detector audit log), Cloud KMS (PDF signing).
Supabase
Postgres + pgvector for voice fingerprint storage and HNSW similarity search, Row-Level Security, Auth, Realtime for collaborative review, Storage for encrypted documents.
Every saved document is embedded into a 1,536-dim vector and stored in Supabase pgvector. Polish prompts are constrained to land within the 95th-percentile cosine-distance contour from the user’s voice centroid. The polish on doc #15 reads more like the user than doc #1 did.
Why Skills can’t: stateless across chats, no document memory, no embedding column, no constraint mechanism.
Every polish queues a Cloud Task. Vertex AI batch prediction runs the output through five detector APIs overnight: GPTZero, Originality.ai, Copyleaks, ZeroGPT, Sapling. Per-document verdicts written to BigQuery. The Disclosure Bundle attaches the actual API responses, not a screenshot.
Why Skills can’t: no infrastructure for outbound paid API calls; no persistent verdict storage; can’t produce a downloadable artifact.
Upload .docx, .tex, or .pdf. Google Document AI extracts structure first, sections, citations, equations, tables, with bounding boxes. The polish model only sees prose tokens; protected entities pass through as placeholders. Reassembly is deterministic and bit-perfect.
Why Skills can’t: raw text in, raw text out, no structured extraction step. Citation rewrites happen ~5% of the time at Discussion-section length.
Application Bundle is a Cloud Tasks state machine, not a single document. Per-applicant state in Supabase, deadline schedule built from the school’s known due date, per-document polish passes orchestrated and rate-limited. Cloud Scheduler triggers per-user evaluations daily and Resend sends the nudge email when a document needs attention.
Why Skills can’t: a chat session has no concept of a deadline, no multi-document state, no scheduled actions, and no email delivery.
14 journal style guides indexed in Vertex AI Search. When polishing a Methods section for Bioinformatics, the polish prompt retrieves the relevant 8–12 chunks live and includes them in cached context. Adding a new journal becomes ‘drop a PDF in a bucket’.
Why Skills can’t: static prompt size budget; can’t hold 14 style guides in one chat; can’t retrieve by section.
At export, Cloud Run + headless Chrome compiles a 14-page PDF: every polished document, every edit annotated with reasoning, every detector verdict from real API runs, voice fingerprint summary, and a ready-to-paste disclosure sentence. SHA-256 signed and timestamped via Cloud KMS. This is the artifact your reviewer can accept, not a chat log.
Why Skills can’t: no PDF generation, no signed artifacts, no reproducible audit trail. See a sample →
Upload (.docx/.tex/.pdf)
│
▼
┌──────────────────────────────────────────┐
│ 1. Document AI ─ structure extraction │ Google Cloud
│ citations, equations, tables │ ($0.0015/page)
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 2. Signal detection (14 AI-typical) │ Claude Sonnet 4.6
│ + section classifier for papers │ + Rewritelyapp adapter
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 3. Vertex AI Search ─ style-guide RAG │ Google Cloud
│ pulls relevant rules for THIS section │ (~$2/1K queries)
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 4. pgvector lookup ─ user voice envelope │ Supabase
│ + 50 nearest admitted-statement refs │ (HNSW index)
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 5. Constrained polish │ Claude Sonnet 4.6
│ voice envelope + protected entities │ prompt-cached
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 6. Validation + revert │ ensemble check
│ rubric, detector, voice constraints │ any regression → revert
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 7. Cloud Tasks ─ enqueue nightly │ → 5 detector APIs
│ batch validation │ via Vertex AI batch
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ 8. Disclosure Bundle PDF on demand │ Cloud Run + Chrome
│ 14 pages, signed, timestamped │ ($0.04/bundle)
└──────────────────────────────────────────┘
Median wall-clock: 2.4s per 500 words polished + overnight detector run
| Component | Rate | 12-month spend (est.) |
|---|---|---|
| Cloud Tasks + Scheduler (orchestration) | Per-call, negligible | ~€20 |
| Document AI (citation-safe extraction) | $0.0015 / page | ~€150 |
| Vertex AI Search (style-guide RAG) | $2 / 1K queries | ~€240 |
| Cloud Run (app + PDF export) | Per-second compute | ~€180 |
| BigQuery (detector audit log) | 1TB free / month | ~€0 |
| 5 detector API subscriptions | Variable | ~€800 |
| Vertex AI · Claude Sonnet polish (with caching) | ~$0.0008 / 1K cached tokens | ~€420 |
| Vertex AI · Gemini embeddings (voice fingerprint) | $0.13 / M tokens | ~€60 |
| Total against GCP credit (incl. model) | ~€1,870 / €2,000 |
By the time the credit runs out, the orchestrated Application Bundle business should be paying for itself. If it isn’t, the bet was wrong, and we learn that within a year, not three.
The architecture is only impressive when it shows up in your diff. Paste one paragraph, the pipeline runs the same as for a paying user.
Try free →