Methodology

How Rewritelyapp actually polishes writing, in plain language.

Most AI writing tools won’t tell you how they work. We will. This page is the technical companion to our weekly proof table: every component of the polish pipeline, what it does, and why we built it that way.

Last updated May 21, 2026 · model changelog · download as PDF

Pipeline

The polish process, in five stages.

Your draft goes through five distinct stages. Each one has a job, an input, an output, and a measurable failure mode.

1

Signal detection

Your draft is scored against 14 signals known to mark machine-generated prose. The current set: low-perplexity bursts, hedging absence, formulaic transitions, nominal-phrase density, citation-style anomalies, sentence-length monotony, AI-typical bigrams (‘delve into’, ‘multifaceted’, ‘leverage’), absent failure-statements, missing first-person specifics, anchored to a 12,000-document reference distribution. Output: a per-paragraph signal-score vector.

2

Section & entity extraction

For research papers: section detection (Abstract / Intro / Methods / Results / Discussion) via header parsing + a fine-tuned classifier. Citation extraction ((Author, Year), [12], \cite{key}, DOIs) into a citation table. Defined-variable extraction. Quantitative claims (numbers + units + p-values) flagged and protected.

3

Strategy selection

Based on the document type and the signal-score vector, the polish engine activates a subset of 18 named strategies. Specificity, hedging, thesis-forward opening for personal statements; nominal-phrase reduction, claim-tightening, journal-fit verbs for research papers. Each strategy is a constrained rewrite rule with an explicit input/output contract, not a free-form rephrase.

4

Constrained rewrite

Selected strategies are applied via a fine-tuned language model under three hard constraints: (a) protected entities pass through bit-perfect, (b) per-section polish budget (Methods is the most conservative), (c) voice fingerprint, the user’s sentence-length distribution, lexical diversity, and 4-gram profile from their prior documents. The polish model is Claude Sonnet 4.6 served via Vertex AI Model Garden plus a Rewritelyapp-trained adapter; no user data trains the adapter.

5

Validation

Output is re-scored on (a) the 14 detection signals, (b) the relevant rubric (23-criterion for statements, style-guide compliance for papers), (c) three detector models, GPTZero, ZeroGPT, internal ensemble. Any edit that pushes a metric backward is reverted automatically.

Constraints

What we hard-coded into the pipeline.

These are the constraints we won’t turn off, even if a user asks us to. They’re what makes Rewritelyapp safe for academic use.

  • No content generation. Sentences that don’t exist in your draft can’t exist in the output. The model has no ‘completion’ mode.
  • Citation preservation. Citation keys are extracted, replaced with placeholders, and reinserted character-for-character.
  • Quantitative-claim lock. Numbers, units, p-values, CIs, and method names are protected. Sentences containing them are polished around the protected fragment, not through it.
  • Methods-section discipline. Active-voice conversion and sentence-level rewrites are disabled in Methods sections by default.
  • Voice cap. Polish can’t move your sentence-length distribution or lexical diversity by more than 12% in either direction.
Reference distribution (statement polish)
  1. 1,200 admitted personal statements, Common App + UCAS + Sciences Po, 2023–2025, hand-annotated against 23 criteria.
  2. Inter-annotator agreement κ = 0.78 across two annotators.
  3. Held-out test set: 240 statements, never seen during model tuning.
Reference distribution (paper polish)
  1. 14 journal style guides, current as of Q1 2026.
  2. 3,400 published manuscripts from those journals, used for tone calibration only.
  3. Section classifier validated on 800 held-out papers; F1 = 0.94.
Detector evaluation
  1. 200-essay corpus, rotated quarterly. See proof page.
  2. Five detectors via public API, default thresholds, paying-tier accounts.
  3. Weekly run published every Monday, 06:00 UTC.
Evaluation

How we know it works.

95.8%

Avg detector pass rate across five detectors, 200-essay corpus, week of May 21.

+38

Median rubric-score gain on personal statements (out of 100), measured across our reference corpus.

0

Citations altered or claims fabricated. The placeholder pipeline rejects any polish that doesn’t round-trip bit-perfect.

Model card

The model behind the polish.

Base modelClaude Sonnet 4.6 via Vertex AI Model Garden + Rewritelyapp adapter v3.0
Training data (adapter)Public academic writing guides + 4,800 author-permissioned essays. No user-submitted documents.
HostingVertex AI (europe-west4) + Rewritelyapp application layer (Cloud Run, europe-west1)
Latency (median)2.1s per 500 words polished
Known failure modesDocuments with mixed languages, equation-heavy LaTeX, and footnote-heavy humanities prose see degraded performance. We flag this at upload.
Last model updatev3.0 · May 14, 2026 · changelog

Now run it on something of yours.

The methodology is only convincing on paper. The diff is convincing on your own draft.

Try Rewritelyapp free →