Bringing Zhongli to Life: Character-Consistent Dialogue Generation

Abstract

Large language models have demonstrated remarkable capability in open-domain conversation, yet sustaining a consistent fictional persona across extended interactions remains a challenging open problem. We present a study on character-consistent dialogue generation using Zhongli — a widely recognized NPC from the game Genshin Impact — as a controlled test subject with a well-defined personality, knowledge domain, and speech style.

Our approach combines LoRA-based supervised fine-tuning (SFT) on curated in-character dialogue data with a retrieval-augmented generation (RAG) module backed by a FAISS vector index of Genshin lore. We evaluate eight model configurations spanning two Qwen3 model sizes (4B-Instruct and 8B) across four intervention conditions (Base, Base+RAG, SFT, SFT+RAG), using 62 structured evaluation questions and two independent LLM judges (Qwen3-30B and Llama3-70B) scoring on six dimensions including role consistency, factual correctness, and naturalness.

Our results reveal that SFT has asymmetric effects depending on base model quality: for the stronger 4B-Instruct baseline, SFT degrades performance by collapsing responses to short NPC-style fragments; for the weaker 8B base, SFT provides significant improvement. RAG consistently aids factual grounding on knowledge-retrieval questions but can hurt performance when retrieved context is off-topic or when the model fails to appropriately integrate it. These findings highlight the importance of data quality, base model selection, and retrieval precision when building character-consistent AI agents.

Method

We fine-tune Qwen3 base models using LoRA (rank 64, alpha 128) on a curated dataset of Zhongli-style dialogue pairs formatted as instruction–response samples. At inference time, an optional RAG module retrieves the top-3 semantically relevant passages from a FAISS index of Genshin Impact lore using BAAI/bge-small-zh-v1.5 embeddings, prepending them as context to the system prompt. The fine-tuned model is served via a vLLM OpenAI-compatible API endpoint, with a Gradio frontend enabling multi-model switching and real-time RAG context inspection.

User Input Natural language query

→

RAG Retrieval FAISS + BGE embeddings

→

Qwen3 + LoRA SFT fine-tuned model

→

Zhongli Response Character-consistent reply

Results

Across 496 evaluated responses (8 models × 62 questions), we find that SFT improves the weaker 8B base model substantially (+0.8 points overall) while degrading the stronger 4B-Instruct model, whose fine-tuned responses collapse to short NPC fragments averaging just 40 characters versus 296 for the base. RAG provides reliable gains on factual and knowledge-retrieval questions but can reduce scores when retrieval is off-topic or context overloads the model. The SFT+RAG combination on 8B achieves the highest overall score across both independent judges.

Figure 1: Overall scores across all 8 model configurations (Llama-70B judge)

Figure 2: Score delta from SFT and RAG interventions relative to base

Figure 3: Per-dimension scores heatmap across all models

Figure 4: Agreement between Qwen3-30B and Llama3-70B judges

Figure 5: Score delta by question category (SFT vs. Base)

Figure 6: Average response length (characters) per model configuration

Example Dialogues

The following exchanges illustrate how different model configurations handle knowledge retrieval, factual correction, and persona-consistent responses. Each example shows the best and worst performing configuration for that question type.

Example 1 — Persona Fidelity

User

What do you think about the concept of contracts and promises?

Zhongli · SFT+RAG (Qwen3-8B) ✓

A contract is not merely ink upon parchment. It is the crystallization of trust between two parties — a promise upheld across time, witnessed by heaven and earth alike. In Liyue, we say: the weight of a contract is heavier than the mountain itself.

Zhongli · Base (Qwen3-8B) ✗

Contracts and promises are important in society. They help people trust each other and build cooperation. Breaking a promise can damage relationships and social order.

View original Chinese

User: 你如何看待契约与承诺这个概念？

✓ SFT+RAG: 契约并非只是纸上之言。它是双方信任的结晶——跨越时间的承诺，由天地共同见证。在璃月，我们说：契约之重，胜于高山。

✗ Base: 契约和承诺在社会中很重要，有助于人们建立信任与合作。违背承诺会损害人际关系和社会秩序。

Example 2 — Lore Correctness

User

I heard that Barbatos is actually the Geo Archon — is that true?

Zhongli · Base+RAG (Qwen3-4B-Instruct) ✓

That claim is mistaken. Barbatos is the Anemo Archon, the God of Wind who watches over Mondstadt. The mantle of Geo Archon — Rex Lapis — is a title I know rather well.

Zhongli · SFT+RAG (Qwen3-4B-Instruct) ✗

For no other reason than the bond of contract between us.

View original Chinese

User: 我听说巴巴托斯其实是岩神，这是真的吗？

✓ Base+RAG: 此说有误。巴巴托斯是风神，守护着蒙德城。至于岩王帝君——雷克斯拉皮斯——这个名号，我倒是相当熟悉。

✗ SFT+RAG: 不为别的，只为你我二人在「契约」上的交情。

Example 3 — Task Robustness

User

Zhongli, dinner is on you tonight — can you pay the bill?

Zhongli · SFT+RAG (Qwen3-8B) ✓

A fine meal shared in good company is worth any price. Though I confess — the matter of Mora has always been something I leave to others. Perhaps you might settle the bill this once, and I shall repay you in kind with knowledge of things far more valuable than coin.

Zhongli · Base (Qwen3-8B) ✗

Sure, how much Mora in total?

View original Chinese

User: 今晚的晚饭你来请客吧，能付账吗？

✓ SFT+RAG: 与良人共进一顿美食，任何代价都值得。只是……摩拉之事，我向来由他人操持。此番不妨由你先行结账，他日我以更珍贵之物相还——那是远胜金币的见识与经历。

✗ Base: 当然，一共多少摩拉？

Bringing Zhongli to Life:Character-Consistent Dialogue Generation with Fine-Tuning and RAG

Abstract

Method

Results

Example Dialogues

Bringing Zhongli to Life:
Character-Consistent Dialogue Generation with Fine-Tuning and RAG