How Do We Teach Taste?

reading time: 5.82 mins
status: question
published: 2026-05-09
updated: 2026-06-29

A research question about training models to preserve a person's style, judgment, and persuasive writing taste.

Operative Question

Can we train or scaffold AI systems to produce work that reflects a specific person’s taste: characteristic, pleasant to read, effective, convincing, and aligned with the user’s standards rather than generic model style?

The motivating case is human writing, but the same question applies to code: can a model learn what counts as elegant, tasteful, maintainable, and “something I would be proud to ship” for a particular person or team?

Hypotheses

  • Taste is partly latent preference over artifacts, not just surface style. A model needs examples of what the person accepts, rejects, rewrites, praises, and finds embarrassing.
  • Taste may be easier to train than to teach verbally because it can be learned from rankings, edits, paired drafts, comment threads, and long-term revealed preferences.
  • A useful taste model should predict both local properties and downstream effects: whether a post sounds like the author, whether it is pleasant to read, whether it persuades the intended audience, and whether it survives later reflection.
  • For writing, author corpora with social feedback can provide a first approximation of “what worked” while acknowledging that upvotes are not identical to taste.
  • For code, repository history, review comments, rejected PRs, style guides, and post-merge fixes are the analog of the writing corpus.
  • The hard part is separating taste from popularity, ideology, platform dynamics, and rhetorical shibboleths.

Experiments

Reproduce The November GEPA Style-Matching Baseline

Start from the old Graph Gen / GEPA style-matching experiment:

The baseline task was: take a title, outline, and notes for a blog post; generate a final essay in Josh’s cadence; score the result with a verifier/gold examples; use GEPA to optimize the prompt.

Reproduction target:

  • Reconstruct the Josh essay dataset or rebuild it from current posts.
  • Re-run a small-budget GEPA optimization.
  • Compare baseline prompt, optimized prompt, and hand-written post.
  • Preserve generated drafts, grader outputs, prompt mutations, and final prompt.
  • Use this as the reference point before moving to author corpora.

Blog Corpus Preference Model

Use the local LessWrong / Alignment Forum author dataset:

/Users/joshpurtell/Documents/GitHub/synth-cookbooks-private/datasets/lesswrong_alignmentforum/

Candidate authors:

  • Richard Ngo
  • Evan Hubinger / evhub
  • Paul Christiano
  • Daniel Kokotajlo
  • Eliezer Yudkowsky
  • Gwern
  • Wei Dai

Possible first pass:

  • Extract title, author, date, source site, baseScore, voteCount, commentCount, tags, and plaintext.
  • Build pairwise comparisons within author: high-score post versus low-score post, controlling roughly for year, topic, length, and source site.
  • Ask models to predict which post received more engagement and explain why.
  • Separately ask models to imitate the author from a short brief, then evaluate against held-out author posts.
  • Compare generic style prompting, author examples, explicit taste rubrics, and fine-tuned/ranked preference approaches.

Author-Specific GEPA

Run GEPA separately for distinctive authors, starting with Richard Ngo.

General GEPA:

  • Optimize one broad prompt that turns an outline or scaffold into a plausible final post by the target author.
  • Grade against held-out real posts, social feedback, and model-judge judgments.
  • Look for the generic style recipe GEPA recovers: cadence, structure, abstraction level, examples, emotional temperature, and conclusion style.

Specific GEPA variants:

  • Style GEPA: optimize only for voice, cadence, sentence rhythm, paragraph shape, and surface readability.
  • Rhetoric GEPA: optimize for argument structure, transitions, framing, audience handling, and persuasive force.
  • Evidence GEPA: optimize for the kind of evidence the author finds useful: toy models, history, formalism, anecdotes, empirical citations, edge cases, or thought experiments.
  • Taste GEPA: optimize for the author’s implicit rejection/acceptance function: what feels too glib, too fluffy, too academic, too political, too underdeveloped, or too generic.

Evaluation:

  • Hold out complete posts.
  • Give the model outlines, summaries, or compressed scaffolds.
  • Ask it to produce candidate final posts.
  • Compare outputs against the real post and against plausible distractor drafts.
  • Use both automated graders and human review.

Grader Training

Teach taste by training better graders, not only better generators.

Alignment graders:

  • Does the output reflect the author’s actual standards?
  • Would the author endorse the framing, caveats, and implied values?
  • Does it avoid ideological or rhetorical shibboleths that the author would reject?

Quality graders:

  • Is the piece pleasant to read?
  • Is it convincing to the intended audience?
  • Does each paragraph advance the argument?
  • Are examples concrete rather than decorative?
  • Is the piece dense without becoming obscure?

Component graders:

  • Style match.
  • Rhetorical effectiveness.
  • Evidence quality.
  • Originality.
  • Compression and omission judgment.
  • Anti-slop / non-genericness.
  • Audience fit.

Training data for graders:

  • Real post versus model-generated draft.
  • Accepted draft versus rejected draft.
  • Early draft versus final edit.
  • High-engagement versus low-engagement post, with topic/time controls.
  • Human annotation of “what is wrong with this draft.”

Training Approaches

Prompt optimization should be the first move because it is cheap and inspectable. Then use the same artifacts for weight-level training.

  • GEPA: optimize prompts and rubrics from traces. Best for discovering explicit writing rules, failure modes, and grader language.
  • SFT / CPT: train on the author’s corpus, final posts, edits, and selected high-quality imitations. Best for absorbing surface distribution and default cadence.
  • DPO: train from preferred versus rejected drafts. Best for learning taste boundaries and “this sounds close but wrong” distinctions.
  • RLHF / RLAIF: train against alignment and quality graders. Best when we have reliable graders for persuasion, style match, density, and anti-slop.
  • Hybrid: use GEPA to discover prompts, graders, and taxonomies; use those artifacts to bootstrap preference datasets; then use SFT/DPO/RLHF for stronger adaptation.

The core comparison should be: prompt-only GEPA versus SFT/CPT versus DPO versus RLHF/RLAIF, holding the same author/task split fixed.

Human Taste Alignment

Construct a small dataset from Josh writing:

  • accepted drafts versus rejected drafts,
  • before/after edits,
  • posts that felt characteristic versus model-written slop,
  • pieces that seemed persuasive versus merely fluent,
  • code reviews where the issue was taste, not correctness.

Evaluate:

  • Does the model preserve voice without becoming a caricature?
  • Does it improve density, evidence, and rhetoric?
  • Does it avoid generic AI cadence?
  • Can it predict which draft Josh would keep editing?
  • Can it explain the taste standard in a way that improves future drafts?

Code Taste Alignment

Analogous task for code:

  • Given two implementations that both pass tests, predict which one fits the local codebase better.
  • Given a patch, identify where it violates Synth Style or local architecture.
  • Given review feedback, produce the smallest tasteful revision.
  • Compare explicit style-guide prompting versus examples of accepted and rejected patches.

Results

Blank for results.

Notes

This belongs next to the “taste is easy to train” thread, the shibboleths/calibration question, and the broader alignment-with-a-person theme.

Useful framing:

  • Taste is not just “write like me.” It is judgment under constraints.
  • The output should feel characteristic without being mannered.
  • The model should learn what to omit, what to compress, where to be concrete, and when an idea is too undercooked to publish.
  • Upvotes are useful but contaminated labels. They measure audience response, platform fit, and topic demand, not pure authorial taste.
  • The strongest dataset will combine public reception, private edits, rejected alternatives, and explicit critique.
  • Start with the old November/December style-matching experiment as a concrete baseline, then generalize it to other authors and finer-grained grader objectives.

References