AI最佳实践：如果一开始没成功，那就再试一次，反复提示

qimuai 发布于 2026-5-4 23:01 阅读：0 一手编译

内容来源：https://www.geekwire.com/2026/ai-best-practices-if-at-first-you-dont-succeed-prompt-prompt-again/

内容总结：

AI提示工程实用指南：从“老虎机”到“电动工具”的进阶之路

近日，一篇关于如何有效使用AI的实操指南引发关注。文章指出，许多用户将AI当作“老虎机”，指望一次输入就能获得完美答案，结果往往失望而归。实际上，AI更像一把“电动工具”，关键在于掌握正确的“提问技巧”。

核心原则：把AI当“聪明但字面理解的新员工”

文章建议，用户应将大语言模型视为一位才华横溢但按字面意思执行指令的首日新员工。他们有能力，但需要极其精确的指令。检验提示词质量的一个黄金法则是：将你的问题拿给一位毫无背景的同事看，问他能否执行。如果不能，AI也不能。

但作者特别提醒，切勿将模型真的当作“人”。新员工会提问、会记住上下文、能发现指令不合理，但AI默认不会。利用这个比喻是为了提醒自己提供具体背景，而非期待模型拥有人类判断力。

三大基础技巧：立竿见影的效果

具体说明格式、长度、受众和约束条件：模糊的提示产生模糊的回答。例如，不要写“写一篇营销趋势分析”，而要写“分析过去六个月B2B SaaS领域最重要的三大营销趋势，每个附一个公司案例，并用一句话判断该趋势会加速还是趋缓。写成一篇面向非技术董事会、400字的简报。”
提供几个示例：这是日常提示中最具杠杆效应的操作。模型从示例中学习模式的速度远快于文字描述。用户只需提供两三个期望的输出格式，AI就能精准模仿。
告诉AI该做什么，而非不该做什么：负面指令更容易被违反。将“别太正式、别用术语、别无聊”改为“用温暖、对话式的口吻，就像一位聪明的同事在喝咖啡时向你解释，使用平实的英语和短句。”

进阶技巧：像程序员一样迭代

将提示视为测试驱动开发：你的第一个提示只是草稿。建立一组小测试用例，在所有用例上运行提示，遇到失败就修改一个变量，直到输出稳定优质。
明确“完成”的定义：告诉模型什么算作完成。例如，调试Python错误时，要求它必须：（1）确定根本原因，（2）提出具体修复方案及更正后的代码，（3）解释原始代码为何失败。不确信时必须明确说明。
校准任务的思考强度：现代推理模型有“思考力度”调节。简单的提取、分类用低力度，复杂的综合、策略分析用高力度。默认可调性往往导致复杂任务表现不佳。
建立个人提示词库：将过去有效的提示按任务类型保存为模板，避免每次都从零开始遗忘关键要素。

重要禁忌

不要对推理模型说“逐步思考”——它们内部已自动执行，多加指令反而有害。
不要过度依赖“不要”“禁止”等否定性指令，优先采用正向表达。
不要因文笔流畅就轻信内容正确——幻觉最危险时往往写得最漂亮。
不要使用激进语言（如“必须”“绝对重要”），现代模型对普通语言已高度敏感，激进措辞反而引发过度谨慎或拒绝。
不要在提示中使用未定义的缩写，这会明显降低输出质量。
迭代优化时，一次只改一个变量，否则无法判断哪个改动有效。
不同模型家族需要不同提示方式，一套提示词无法通吃GPT、Claude和Gemini。

结语

文章强调，最擅长使用AI的人不是那些拥有最棒提示模板的人，而是那些把模型当作推进工作的强大工具、愿意迭代对话的人。你无需一开始就清晰无比，好的对话能帮你发掘自己未曾考虑过的选项和问题。但识别正确答案的责任，始终在你肩上。

中文翻译：

[编者按：本文是奥伦·埃齐奥尼关于AI使用与最佳实践系列文章的第三篇。另见《AI教练还是AI代笔？选择权在你》和《如何借助AI进行阅读》。]

一位朋友曾就某专业事项向ChatGPT征求意见，得到的回应却平淡乏味、毫无新意。我建议她换个方式：先向AI索要15个不同思路，快速浏览后挑出两个最有潜力的，再让ChatGPT进行优化。她欣然照做，事后欣喜不已。ChatGPT并未变聪明，但她的提问技巧提高了。

这是我钟爱的一招：向AI索要多重选项，深入挖掘有潜力的方向，而最关键的是——如果初次尝试未达预期，那就再问、再问、再问！

下文将提供实用建议，教您如何将AI作为强力工具而非“老虎机”来使用。对于简单请求，这些方法或许有些大材小用，但若您认真对待提示词技巧，请继续阅读。

Anthropic公司针对其模型Claude的官方指南中有一条有用提示：将模型视为一个第一天入职、才华横溢但只会照章办事的新员工。他们有能力，但也是新手。他们会完全按你的指令行事，所以你必须明确说出自己的需求。

Anthropic团队的黄金法则是：将你的提示词展示给一位毫不知情的同事，问他能否照做。如果答案是否定的，那模型也同样无法执行。遵循这一原则可以养成几个习惯，在动用任何高级技巧之前，就能立即提升输出质量。

不过，我要提醒一点：不要将模型当作人。它不是。“才华横溢的新员工”这个描述只是一个有用的出发点，是比喻而非现实。新员工会追问细节、记得你昨天说过的话、发现指令不合理时也会提出疑问。Claude默认情况下则不会。可以借助这个比喻提醒自己表述要具体、提供上下文，但一旦开始期待模型具备人类判断力时，就要立刻放下这个比喻——因为它根本不具备这样的能力。

以下是一份操作指南，以列表形式呈现，便于查阅和定期复习。

对格式、篇幅、受众和限制条件要具体明确。

模糊的提示词产生模糊的输出。解决办法是说出你真正想要的内容。

不佳示例：写写营销趋势。
优化示例：分析过去六个月中B2B SaaS领域最重要的三大营销趋势。针对每个趋势，给出一个企业实例，并用一句话判断该趋势是会加速还是会趋于平稳。以400字简报形式呈现，面向非技术背景的董事会成员。

提升提示词质量，往往只需明确限制条件。模糊的提示词会产生安全、圆滑、百科全书式的答案，因为模型没有关于优化目标的信号，只能默认提供全面覆盖。而具体的提示词会产生有观点、有用的答案，因为限制条件排除了安全但无用的选项。要求“三个”而非“一些”迫使模型进行排序。要求“加速或趋于平稳”迫使模型做出判断。要求“董事会简报”决定了哪些内容需要被删减。你每增加一个限制条件，模型就无法再回避一个决策。

提供几个示例。

这是日常提示词技巧中杠杆效应最高的操作。模型从示例中学习模式的速度快于从描述中学习。

不佳示例：把这些会议记录转化为行动项。
优化示例：把这些会议记录转化为行动项。请匹配以下格式：示例1：记录：“Sarah将研究定价问题，下周给我们回复。”行动项：Sarah → 研究定价方案 → 下周五前完成。示例2：记录：“我们同意推迟发布。”行动项：团队 → 修订发布计划 → 下周一站会前完成。现在请对以下记录执行相同操作：[粘贴内容]

告诉模型该做什么，而非不该做什么。

否定指令比肯定指令更容易被违反。改用肯定的表述方式能获得更清晰的结果。

不佳示例：不要太正式。不要用专业术语。不要写得太无聊。
优化示例：用温暖、对话式的语气来写，就像一位聪明的同事在喝咖啡时跟你解释一样。使用平实的英语和短句。

让你的提示词风格与你期望的输出风格相匹配。

这一点可能会让一些人感到意外。如果你的提示词充满项目符号和粗体文本，模型也会返回项目符号和粗体文本。如果你想要流畅的散文，就用流畅的散文来写提示词。

这些习惯看似不起眼。但结合起来应用，就能将提示词从我的朋友最初操作的那种“觉得ChatGPT没什么用”的水平，提升到AI能带来丰厚回报的程度。本文后续部分的高级技巧就是建立在这个基础之上的，但如果提示词连基本要求都达不到，这些技巧也无能为力。

在基础之上，这里还有一套来自OpenAI、Google、一线开发者以及那些以构建生产级AI系统为职业的人们的有效习惯。与其说是技巧，不如说是工作流程规范。

迭代；将提示词视为测试驱动开发。

你的第一个提示词只是一个草稿。经验最丰富的从业者会构建少量测试用例（他们关心的输入），在所有用例上运行提示词，不断优化直到输出始终如一地好。已有多个开源工具包可以使这一循环流程化。

不佳示例：写提示词。在一个例子上试试。看起来不错。发布。
优化示例：写提示词。挑选五个输入，包括那些棘手的边缘情况。在所有五个输入上运行提示词。在出错的地方，修改提示词中的一处内容并重新测试。保留能在最多用例上正常运行的版本。

明确“完成”的定义。

OpenAI针对GPT-5的官方指南强调，要告诉模型什么才算完成回答。否则，模型会自行判断，往往在给出第一个看似合理的回答后就停止。

不佳示例：帮我调试这个Python错误。
优化示例：帮我调试这个Python错误。当满足以下条件时，你才算完成：(1) 找到了根本原因；(2) 提出了具体的修复方案并附上修正后的代码；(3) 解释了原始代码失败的原因。如果你对上述任何一点没有把握，请明确说明，不要猜测。

根据任务调整“努力程度”。

现代推理模型具备“努力”或“思考”程度的调节旋钮。提取和分类任务使用低努力；综合和策略任务使用高努力。大多数用户将其保持在默认状态，从而在处理难题时付出代价。

不佳示例：总结这份80页的报告。
优化示例：将思考努力度设为高。阅读整份报告。找出三个最重要的发现、两个最薄弱的论点，以及一个我应该向作者提出的问题。请注明页码。

直接注入当前或专有语境。

注意避免模型不熟悉的行业术语和缩写（例如，使用“项目管理办公室”而不是缩写PMO）。模型无法访问你的内部文件。请粘贴相关材料。

不佳示例：我该如何构建一个相关工作部分，将我的框架与之前的智能体治理提案进行比较？
优化示例：以下是我当前的相关工作部分草稿，以及我正在与之对比的三篇论文的PDF（已粘贴）。仅基于这些来源，找出我尚未承认的、与论文的重叠点，以及我草稿中引用论文实际上并不支持的主张。

建立个人提示词库。

这是专业人士的进阶操作。昨天行之有效的模式，明天很可能也依然有效。不必每次都从头重写。保存那些能持续产出良好结果的提示词，按任务类型进行组织。将它们视为活文档，而非一次性尝试。

不佳示例：打开一个新对话。凭记忆输入背景、限制条件、示例和问题。然后看着自己漏掉其中两个。
优化示例：打开你的提示词库。复制“为我的经理起草备忘录”模板。粘贴今天的具体主题和原始材料。运行。

以下是一些关键的禁忌：

不要对推理模型说“一步步思考”。
像OpenAI的o系列和GPT-5的思考功能已经在内部执行此操作。添加这条指令反而可能有害。只对日常模型保留这条指令。
不要在一切指令中依赖“不要”或“绝不”。
模型，尤其是Gemini，可能会过度关注宽泛的否定限制，导致基本推理能力下降。优先使用肯定表述：告诉模型该做什么。
不要将文笔优美视为正确性的证据。
幻觉在写得漂亮时最为危险。正如我在《如何借助AI进行阅读》中指出的，你必须仔细核实AI的输出。
不要使用咄咄逼人的语言（“至关重要：你必须……”）。
现代模型对普通指令高度敏感。咄咄逼人的措辞可能导致输出过于谨慎，并引发拒绝回应。请使用正常语言。
不要在提示词中包含未定义的缩写。
这会明显降低输出质量。关于提示词变更影响的研究，请参见近期关于Brittlebench的论文。
迭代时不要一次性改三处。
当提示词不奏效时，每次只改变一个变量，测试，然后再改下一个。否则你不知道是哪项改动起了作用。
不要认为相同的提示词在不同模型间通用。
不同的模型家族需要不同的提示词。同一条指令可能对某个模型有帮助，却对其他模型有害。适用于GPT的温度和努力度设置，不一定适用于Claude或Gemini。
不要将第一次回答视为最终答案。
未能迭代是日常AI使用中常见的失败模式。这里有一个让AI在多步骤任务中表现更好的技巧：每次尝试后，让AI写一段简短的自我批评，说明哪里出了问题，并将这段笔记存入其记忆以便下次尝试。无需高深技巧，只是让模型用普通英语“自言自语”。在下次尝试时，它会阅读自己过去的反思并进行调整。这个循环相比一次性提示词能产生显著的性能提升。

从AI中获益最多的人，并非拥有最佳提示词模板的人。而是那些将模型视为推进工作的强大工具的人。 你无需在开始时就能清晰地表达自己到底想要什么。一次良好的对话就能帮你达成目标，揭示你独自一人可能会错过的选项和问题。但AI做不到的，是当正确答案出现时识别出它。这一部分，仍然取决于你。

延伸阅读……

供应商文档：

Anthropic 提示工程最佳实践
OpenAI 提示工程指南
OpenAI GPT-5 提示词指导
Google Gemini 提示词策略
Google Vertex AI 提示设计

从业者资源：

Promptfoo（测试驱动的提示工程）
Promptimize（预设、测试驱动的提示词）
PromptHub（成功标准与评估）
GitHub 上的提示工程实践

编者按：GeekWire 发表来宾观点，旨在促进知情讨论并突出对科技与创业社群议题的多元视角。如果您有兴趣提交来宾专栏，请发送邮件至 [email protected]。我们编辑团队将对稿件进行相关性和编辑标准的审核。

英文来源：

[Editor’s Note: This is the third in a series by Oren Etzioni about AI usage and best practices. See also “AI Coach or AI Ghostwriter? The Choice Is Yours,” and “How to read with AI.”]
A friend asked ChatGPT for input on a professional matter and received a banal, lackluster response. I suggested she try a different approach: ask for 15 different ideas, scan them, pick the two that felt most promising, and then ask ChatGPT to refine. She came back overjoyed. ChatGPT had not gotten smarter, but she became better at prompting.
This is my favorite gambit: ask AI for many options, delve deeper into the promising ones, and most importantly, if at first you don’t succeed, prompt, prompt again!
What follows is practical advice on how to use AI as a power tool rather than a slot machine. For a simple request, it’s overkill, but if you’re serious about prompting, read on.
Anthropic’s own guidance for prompting Claude contains a helpful hint: treat the model as a brilliant but literal-minded new employee on their first day. They are capable. They are also new. They will do exactly what you ask, so you have to ask exactly what you want.
The Anthropic team’s golden rule is to show your prompt to a colleague with no context and ask whether they could follow it. If the answer is no, the model can’t either. This principle generates a handful of habits that lift output quality immediately, before any of the more advanced techniques come into play.
One caveat from me, though: don’t think of the model as a person. It’s not. The “brilliant new employee” framing is a useful starting point, but it’s a metaphor, not reality. A new hire asks follow-up questions, remembers what you said yesterday, and notices when an instruction is dumb. Claude does none of that by default. Lean on the metaphor to remember to be specific and provide context, but drop it the moment you start to expect human judgment that just isn’t there.
Here’s the playbook, organized as a list for easy reference and periodic review.
Be specific about format, length, audience, and constraints.
Vague prompts produce vague output. The fix is to say what you actually want.

Before: Write about marketing trends.
After: Analyze the three most significant B2B SaaS marketing trends from the past six months. For each, give one company example and a one-sentence assessment of whether the trend will accelerate or plateau. Write it as a 400-word brief for a non-technical board.
Improving prompt quality is often simply stating constraints. Vague prompts produce safe, hedged, encyclopedic answers because the model has no signal about what to optimize for and defaults to coverage. Specific prompts produce opinionated, useful answers because the constraints eliminate the safe-but-useless options. Asking for “three” instead of “some” forces ranking. Asking for “accelerate or plateau” forces a call. Asking for “a board brief” determines what gets cut. Each constraint you add is a decision the model no longer gets to dodge.
Provide a few examples.
This is the highest-leverage move in everyday prompting. Models pick up patterns from examples faster than from descriptions.
Before: Turn these meeting notes into action items.
After: Turn these meeting notes into action items. Match this format: Example 1: Note: “Sarah will look into the pricing question and get back to us next week.” Action item: Sarah → research pricing options → due next Friday. Example 2: Note: “We agreed to push the launch.” Action item: Team → revise launch timeline → due before Monday’s standup. Now do the same for these notes: [paste]
Tell the model what to do, not what not to do.
Negative instructions are easier to violate than positive ones. Reframing in the affirmative gets you cleaner results.
Before: Don’t be too formal. Don’t use jargon. Don’t make it boring.
After: Write in a warm, conversational tone, the way a smart colleague would explain this over coffee. Use plain English and short sentences.
Match the style of your prompt to the style of the output you want.
This one surprises some people. If your prompt is full of bullets and bold text, the model will return bullets and bold text. If you want flowing prose, write in flowing prose.
These habits sound modest. But applied together, they take prompts from the level my friend was operating at, where ChatGPT seemed unhelpful, to a level where AI yields dividends left and right. The advanced techniques in the rest of this piece build on this foundation, but they won’t rescue a prompt that fails the basics.
Beyond the basics, here is a set of effective habits that show up in guidance from OpenAI, Google, working developers, and the people who build production AI systems for a living. These are not techniques so much as workflow disciplines.
Iterate; treat prompting as test-driven.
Your first prompt is a draft. The most experienced practitioners build small sets of test cases (the inputs they care about), run their prompt across them, and refine until the output is consistently good. Several open-source toolkits exist to formalize this loop.
Before: Write the prompt. Try it on one example. Looks good. Ship it.
After: Write the prompt. Pick five inputs, including the awkward edge cases. Run the prompt on all five. Where it fails, change one thing in the prompt and retest. Keep the version that works on the most cases.
Specify a definition of done.
OpenAI’s own guidance for GPT-5 stresses telling the model what counts as a finished answer. Without that, the model decides for itself, often by stopping at the first plausible-looking response.
Before: Help me debug this Python error.
After: Help me debug this Python error. You are done when: (1) you have identified the root cause, (2) you have proposed a specific fix with the corrected code, and (3) you have explained why the original failed. If you are not confident on any of those three, say so explicitly rather than guessing.
Calibrate effort to the task.
Modern reasoning models have effort or thinking dials. Low effort for extraction and triage; high for synthesis and strategy. Most users leave them on default and pay for it on hard problems.
Before: Summarize this 80-page report.
After: Set thinking effort to high. Read the entire report. Identify the three most important findings, the two weakest claims, and the one question I should ask the authors. Cite page numbers.
Inject current or proprietary context directly.
Be careful to avoid jargon and abbreviations unknown to the model (instead of the acronym PMO, say “Project Management Office”). Models don’t have access to your internal documents. Paste in the relevant material.
Before: How should I structure a related work section comparing my framework to prior agent governance proposals?
After: Below is my current draft related work section, plus PDFs of the three papers I am positioning against (pasted). Based only on these sources, identify points of overlap I have not yet acknowledged and any claims in my draft that the cited papers would not actually support.
Build a personal prompt library.
This is a power move for a pro. The patterns that worked yesterday are likely to work tomorrow. Stop rewriting them from scratch. Save the prompts that consistently produce good results, organized by task type. Treat them as living documents, not one-off attempts.
Before: Open a new chat. Type out the framing, the constraints, the examples, and the question from memory. Watch yourself forget two of them.
After: Open your prompt library. Copy the “draft a memo for my manager” template. Paste in today’s specific topic and source material. Run.
Here are some key don’ts:
Don’t tell reasoning models to “think step by step.”
Models like OpenAI’s o-series and GPT-5 thinking already do that internally. Adding the instruction can hurt rather than help. Save it for the everyday models.
Don’t lean on “do not” or “never” instructions for everything.
Models, especially Gemini, can over-index on broad negative constraints and degrade on basic reasoning. Prefer positive framing: tell the model what to do.
Don’t trust polished prose as evidence of correctness.
Hallucinations are most dangerous when they are well-written. As I pointed out in How to Read with AI, you have to carefully verify AI output.
Don’t use aggressive language (“CRITICAL: You MUST…”).
Modern models are highly responsive to ordinary instructions. Aggressive phrasing can produce overcautious output and triggers refusals. Use normal language.
Don’t include undefined acronyms in your prompt.
They measurably degrade output. For research on the impact of prompt changes see this recent paper on Brittlebench.
Don’t change three things at once when iterating.
When a prompt isn’t working, change one variable, test, then change the next. Otherwise you don’t know what helped.
Don’t assume that the same prompt works across models.
Different model families need different prompting. The same instruction can help one and hurt another. The temperature and effort settings that work for GPT are not the ones that work for Claude or Gemini.
Don’t treat the first answer as the final one.
Failing to iterate is a common failure mode in everyday AI use. Here’s a trick for making AI better at multi-step tasks: after each attempt, have the AI write a short critique of what went wrong and tuck that note into its memory for the next try. No fancy mechanics, just the model “talking to itself” in plain English. On the next attempt, it reads its own past reflections and adjusts. This loop can produce meaningful gains over one-shot prompts.
The people who get the most out of AI aren’t the ones with the best prompt templates. They’re the ones who treat the model as a powerful tool for advancing their work. You don’t need to show up with perfect clarity about what you want. A good dialog can get you there, surfacing options and questions you’d have missed on your own. What it can’t do is recognize the right answer when it appears. That part is still on you.
For further reading …
Provider documentation:
Anthropic prompt engineering best practices.
OpenAI prompt engineering guide.
OpenAI GPT-5 prompt guidance.
Google Gemini prompting strategies.
Google Vertex AI prompt design.
Practitioner resources:
Promptfoo (test-driven prompt engineering).
Promptimize (Preset, test-driven prompts).
PromptHub (success criteria and evals).
GitHub on prompt engineering practice.
Editor’s note: GeekWire publishes guest opinions to foster informed discussion and highlight a diversity of perspectives on issues shaping the tech and startup community. If you’re interested in submitting a guest column, email us at [email protected]. Submissions are reviewed by our editorial team for relevance and editorial standards.

Geekwire

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读