Agentic AI — Ram N Sangwan
Generative AI Concepts
Prompt and Prompt Engineering
K-Shot Prompts
Max Tokens, Temperature
Top-K and Top-P

Ability of machines to mimic the cognitive abilities and problem-solving capabilities of human intelligence.
A subset of AI that focuses on creating computer system that can learn and improve from experience. Powered by algorithms that incorporate intelligence into machines.

On a scale of 0 to 100, how introverted/extraverted are you? Have you ever taken a personality test like Big Five Personality Traits?
These tests ask you a list of questions, then score you on number of axes, introversion/extraversion being one of them.

Imagine I’ve scored 38/100 as my introversion/ extraversion score. We can plot that in this way.

Let’s switch the range to be from -1 to 1.

How well do you feel you know a person knowing only this one piece of information?
Not much.
Let’s add another dimension – the score of another trait.


A common way to calculate a similarity score for vectors is cosine_similarity.

Person #1 is more like me. Vectors pointing at the same direction (length plays a role as well) have a higher cosine similarity score.
Two central ideas:
The text provided to an LLM as input, sometimes containing instructions and/or examples.

The process of iteratively refining a prompt for the purpose of eliciting a particular style of response. Prompt engineering is challenging, often unintuitive, and not guaranteed to work. At the same time, it can be effective; multiple tested prompt-design strategies exist.

In-context learning — Conditioning an LLM with instructions and/or demonstrations of the task it is meant to complete.
K-Shot Prompting — Explicitly providing k examples of the intended task in the prompt.
Few-shot prompting is widely believed to improve results over 0-shot prompting.

![]() |
![]() |
![]() |
![]() |
One token can be a part of a word, an entire word, or punctuation.
A word such as “friendship” is made up of two tokens — “friend” and “ship.”
Number of Tokens/Word depend on the complexity of the text.
Simple text: 1 token/word (Avg.)
Complex text (less common words): 2-3 tokens/word (Avg.)
Many words map to one token, but some don't: indivisible.
Language models understand tokens rather than characters.
A common word such as “apple” is a token.

This is the maximum length of the output that the model can generate in one response, measured in tokens. If the max tokens limit is set, the model will not generate more tokens than that limit in its response.
If Top k is set to 3, model will only pick from the top 3 options and ignore all others. Mostly pick “United” but will pick “Netherlands” and “Czech” at times.

If p is set as .15, then it will only pick from United and Netherlands as their probabilities add up to 14.7%. If p is set to 0.75, the bottom 25% of probable outputs are excluded.


![]() |
![]() |
![]() |
Cohere provides a powerful API for its models that integrates language processing into any system.
Cohere develops large-scale language models and encapsulates them within an intuitive API.
You can tailor these models to suit your use cases.
Cohere provides a range of models that can be trained and tailored to suit specific use cases.




import osimport cohere from dotenv import load_dotenv, find_dotenv _ = load_dotenv(find_dotenv()) co = cohere.Client(api_key=os.getenv('COHERE_API_KEY')) # New Chat API V2 co_v2 = cohere.ClientV2(api_key=os.getenv('COHERE_API_KEY'))
Define the Cohere client with the API key.

response = co.embed( texts=['hello', 'goodbye'], model='embed-english-v3.0', input_type='classification' )