Advanced prompt engineering

You've mastered the basics, now it's time to delve deeper! Advanced prompt engineering involves more sophisticated techniques to gain finer control over LLM responses, tackle complex tasks, and enhance the reliability and creativity of your outputs.

Elevating your prompts: advanced techniques

1. System, contextual, and role prompting

These three types of prompts work together to provide comprehensive guidance to the LLM:

System prompt: defines the overall purpose, high-level instructions, or overarching context for the LLM's behavior throughout the conversation (e.g., "you are a helpful assistant specializing in astrophysics.").
Contextual prompt: supplies immediate, task-specific details, data, or background information that the LLM needs for the current turn in the conversation. This is crucial for adapting to dynamic inputs.
Role prompt (persona prompting): assigns a specific persona or character to the LLM (e.g., "act as a seasoned travel guide recommending an itinerary for Paris."). This influences the LLM's tone, style, and the type of information it provides.

Example: combined prompting

System: you are a friendly and encouraging fitness coach.
Context: the user has expressed a desire to start a beginner-level running program but is unsure how to begin. They have access to a park nearby.

User (role prompt implied by system & context):
"I want to start running, but I'm a total beginner. Can you help me create a simple 4-week plan I can do in the park?"

This combination helps the LLM provide a tailored, motivating, and relevant running plan.

2. Step-back prompting

Step-back prompting is a technique to mitigate biases and improve reasoning. It involves prompting the LLM to first consider general principles or abstract concepts related to the query *before* addressing the specific task. This encourages more critical thinking.

Example: step-back for financial advice

Instead of directly asking:

"should I invest in tech stocks or real estate for long-term growth if I have a moderate risk tolerance?"

Try a step-back approach:

Prompt:
1. First, explain the general principles of long-term growth investing for someone with a moderate risk tolerance.
2. Based on these principles, discuss the potential pros and cons of investing in tech stocks versus real estate.
3. Finally, offer some considerations for making a decision.

3. Chain-of-thought (CoT) prompting

Chain-of-thought prompting significantly enhances an LLM's reasoning capabilities, especially for multi-step problems (like math or logic puzzles). You instruct the model to "think step by step" or provide examples that explicitly show the intermediate reasoning steps to reach a final answer.

Example: CoT for a math problem

Q: Natalia sold clips to 4 of her friends. She sold 8 clips to each friend. How many clips did she sell in total?

A: Natalia sold clips to 4 friends. She sold 8 clips to each friend. To find the total number of clips, we multiply the number of friends by the number of clips sold to each friend. 4 friends * 8 clips/friend = 32 clips. The answer is 32.

Q: John has 28 marbles. He lost 7 marbles in a game and then bought 15 more. How many marbles does John have now?

A:

By providing the reasoning for the first question, the LLM is guided to produce a similar step-by-step answer for the second.

CoT best practices

Use it for tasks requiring reasoning, such as arithmetic, commonsense reasoning, and symbolic manipulation.
Ensure your examples of thought chains are logical and easy to follow.

4. Self-consistency

Self-consistency builds upon CoT. Instead of just one chain of thought, you prompt the LLM to generate multiple different reasoning paths for the same problem (often by using a higher temperature setting for more diverse outputs). Then, you take the most common answer among these paths as the final, more reliable answer. This is like getting a "second opinion" multiple times and going with the consensus.

5. Tree of thoughts (ToT) & ReAct (briefly)

Tree of thoughts (ToT): this is an even more advanced technique that generalizes CoT. It allows LLMs to explore multiple reasoning branches or "thoughts" simultaneously, like exploring different paths in a maze. It can evaluate intermediate thoughts and make more global choices in its reasoning process.

ReAct (reason + act): this framework combines the LLM's reasoning abilities with the capability to use external tools or APIs. The LLM can "reason" about what information it needs, "act" by calling a tool (e.g., a search engine, a calculator, or an internal database), and then incorporate the tool's output back into its reasoning process to arrive at an answer. This mimics how humans often solve problems by looking up information or performing actions.

Mastering LLM output: configuration parameters

Beyond the prompt itself, you can significantly influence LLM behavior by adjusting its configuration parameters. These settings control aspects like the randomness, length, and diversity of the generated text.

Key configuration parameters

Output length (max tokens): defines the maximum number of tokens (words/sub-words) the LLM will generate. Setting this too low can cut off responses, while too high can increase costs and computation time. It requires careful balancing with prompt design.
Temperature: controls the randomness of the output.
- Low temperature (e.g., 0.0 - 0.2): more deterministic, focused, and predictable. Good for factual answers or when you want less variation.
- High temperature (e.g., 0.7 - 1.0): more creative, diverse, and potentially surprising. Good for brainstorming or creative writing, but can sometimes lead to less relevant or nonsensical output.
Top-K sampling: at each step of generation, the LLM considers only the 'K' most probable next tokens. A higher K allows for more diversity, while K=1 (greedy decoding) means it always picks the single most likely token.
Top-P (nucleus) sampling: selects the smallest set of tokens whose cumulative probability exceeds a threshold 'P'. This is an alternative to Top-K that can adapt the number of choices based on the probability distribution. For example, if P=0.9, it picks from the most probable tokens that add up to 90% likelihood.

Practical sampling guidelines (general starting points)

Balanced output: temperature ~0.5, top-P ~0.95, top-K ~30-40.
Creative output: temperature ~0.7-0.9, top-P ~0.99, top-K ~40-50.
Factual/deterministic output: temperature ~0.1-0.2, top-P ~0.9, top-K ~20 (or temperature=0).
Math/logic problems: temperature ~0.0.

Sampling setting interactions & warnings

If temperature is 0, top-K and top-P usually have no effect as the output is deterministic.
If top-K is 1, temperature and top-P are often ignored because only the single most likely token is chosen.
Extreme values in any single sampling setting can negate the effects of others.
Repetition loops: improperly tuned temperature, top-K, and top-P can sometimes cause LLMs to get stuck repeating words or phrases. Careful tuning and experimentation are key.

Conceptual API call with parameters

Here's how you might imagine setting these in a hypothetical API call (syntax varies by provider):

// Hypothetical API call structure
const response = await llmProvider.generate({
  model: "some-advanced-model",
  prompt: "Write a futuristic short story about AI discovering music.",
  max_tokens: 500,
  temperature: 0.85,
  top_p: 0.9,
  top_k: 40
});

console.log(response.text);

Advanced best practices for robust prompts

Use instructions, not constraints (usually)

Frame your prompts positively, telling the LLM what you *want* it to do, rather than what you *don't* want. However, for preventing harmful, biased, or off-topic content, explicit constraints are necessary and appropriate.

Control token length wisely

Balance providing enough detail for the LLM to understand the task with keeping prompts concise to avoid "run-on" generation and manage costs. Be mindful of the LLM's context window.

Utilize variables for dynamic prompts

For applications where parts of the prompt change frequently (e.g., user input, data from a database), design your prompts with placeholders or variables. This makes them reusable and adaptable.
Example: translate the following text to Spanish: Your text here

Structured output with JSON

When you need data in a specific, consistent format, instruct the LLM to output JSON. This is invaluable for data extraction and integration with other systems.

Benefits: guaranteed consistent style, focus on data extraction rather than parsing messy text.
Tip: you can provide a JSON schema or an example of the desired JSON structure in your prompt to guide the LLM.
JSON repair: be aware that LLMs can sometimes produce malformed JSON, especially if the output is truncated. Consider using JSON repair tools or prompting strategies to handle this.

Prompt:
extract the name, email, and product of interest from the following customer query. Output as a JSON object with keys "name", "email", and "product".

Query: "hi, my name is Jane Doe and I'm interested in your premium widget. my email is jane.doe@example.com."

The golden rule: experiment & document!

Prompt engineering is an empirical science. The most important best practice is to experiment continuously with different phrasings, structures, examples, and model configurations.

Crucially, document your attempts! Keep a log including:

Prompt name/goal
LLM model used
Configuration parameters (temperature, top-K, top-P, max tokens)
The full prompt text
The LLM's output (and your evaluation of it)

This documentation will be invaluable for learning what works, what doesn't, and iterating towards optimal prompts.

With these advanced techniques and best practices, you're well on your way to becoming a prompt engineering expert! Next, let's explore some practical use cases.

Next chapter: Real-world use cases