Accessing LLMs through APIs

LLMs expose APIs and using them one can connect with LLMs to send prompts and receive responses. Following two samples in python illustrate this just for getting an idea.

Accessing LLMs - Sample 1

Following code shows a way to access the LLM. Do note that GEMINI_API_KEY variable needs to be set before running below program.

# Connect to Gemini LLM 
# Send a prompt to it
# Gather the response and print it

from google import genai
client = genai.Client()
query = input("Enter your prompt here: ")
# Send single prompt, wait for the model to return the entire output
response = client.models.generate_content(model="gemini-2.5-flash", contents= query,)
print(response.text)
Accessing LLMs - Sample 2

Following code shows a way to access the LLM. Do note that GEMINI_API_KEY variable needs to be set before running below program.

from google import genai
client = genai.Client()
query = input("What do you want to ask?: ")
# Stream the responses like on Gemini
response = client.models.generate_content_stream(model="gemini-2.5-flash",contents=query)
for chunk in response:
    print(chunk.text, end="")
Notes:

generate_content(): This function returns the complete generated response in a single object after the entire generation process is finished. This is a blocking call, meaning your program will wait until the model has produced the full output before proceeding.

generate_content_stream(): This function returns an iterable object that yields “chunks” of the generated response as they become available. This allows for streaming the output, meaning you can process parts of the response while the rest is still being generated. This is particularly useful for long responses or when you want to provide a more interactive user experience, as it avoids long periods of waiting for the full response.