Photosmyth - A generative AI
25 March, 2024 - 5 min read
Generative AI applications have revolutionized how we approach creativity, problem-solving, and innovation. With advancements in machine learning and deep learning techniques, such applications have become increasingly accessible, allowing developers to test and deploy them relatively quickly.
In this blog post, I'll share my journey of developing a generative AI application exploring the challenges, breakthroughs, and insights gained along the way.
A couple of weeks ago, I developed Photosmyth, a generative AI application using streamlit, capable of rapid on-demand image synthesis.
To develop photosmyth, I used sdxl-turbo
, a fast text-to-image model that can synthesize photorealistic images from a text prompt provided by NVidia AI Foundation Models and Endpoints. sdxl-turbo
is a state-of-the-art AI modal hosted on NVidia API Catalog which provides easy access to model APIs optimized on the NVIDIA accelerated computing stack, making it fast and easy to evaluate.
NVidia AI models
I used NVidia's open source connector - NVIDIA AI Foundation Endpoints which integrates with longchain. This connector provides easy access to NVIDIA-hosted models and supports chat, embedding, code generation, steerLM, multimodal, and RAG.
Setup -
- The first step involves creating a free NGC account
- Navigate to the desired model page that you want to use (
sdxl-turbo
in my case) - Click
Get API key
in the Input section under the language you are using - Save the key as this is our
NVIDIA_API_KEY
You need to export the key in your shell to access the model.
export NVIDIA_API_KEY='...'
I have observed that the same key works for all the models in the NVidia API Catalog.
Running the code
The code for photosmyth is available on GitHub.
To run the code, follow these simple steps -
Clone the repository
git clone https://github.com/itsiprikshit/photosmyth.git
Create a virtual environment
python3 -m virtualenv venv
Activate the virtual environment
source venv/bin/activate
Install the required dependencies
pip install -r requirements.txt
Start the streamlit app
streamlit run app.py --server.port 8012
Navigate to http://localhost:8012
on your browser.
You've got your personal AI, Photosmyth, that generates images on demand. It looks like this -
You can type your prompt into the chat box, and the AI will generate images for you.
Let's dive into code
Let us walk through several code snippets outlining the implementation of Photosmyth.
First, I import the langchain_nvidia_ai_endpoints
package and instantiate the model using ChatNVIDIA
class provided by langchain that connects to the NVidia package.
from langchain_nvidia_ai_endpoints import ChatNVIDIA
def initialize():
llm = ChatNVIDIA(model="ai-sdxl-turbo")
llm.client.payload_fn = create_payload
chain = llm | base64_to_img
return chain
sdxl-turbo
has strong parameter expectations that langchain does not support by default. Therefore, lagchain allows us to force the payload of the underlying client by using payload_fn
function, where I created the payload, which was eventually passed to the model.
def create_payload(data):
payload = {'text_prompts': []}
if 'messages' in data:
for message in data['messages']:
p = {'text': message['content']}
payload['text_prompts'].append(p)
return payload
After instantiating the model, I chain the llm
instance with base64_to_img
that decodes the base64
image returned by the model.
def base64_to_img(data):
artifacts = data.response_metadata['artifacts']
img = artifacts[0]['base64']
return BytesIO(base64.b64decode(img))
Finally, to invode the chain, I just used the invoke
method and passed the user input as a string.
img = chain.invoke(user_input)
The returned image is visualized as the model response in the UI.
To create the chat component, I used streamlit.
Maintaining Context
After putting all the pieces together, I could interact with the AI and the AI-generated images based on my input. However, it considered every input as an independent query. I realized that I wasn't saving the conversation context, so I tweaked it to save the context.
context = ''
for i in range(len(st.session_state.messages)):
message = st.session_state.messages[i]
if message['role'] == 'user':
context += message['content']
I now provide the entire conversation context to the model on every query. Finally, the images its generating now are based on the conversation context. I am aware that I am using a very naive method of providing context to the model by sending the entire conversation in the query.
I plan to enhance the capabilities of my AI by incorporating more sophisticated methods for contextual understanding. I could summarise the conversation using Natural Language Toolkit (NTLK) or spaCY and provide the summary of the conversation as context to the model. I'm also considering implementing a feature where users can upload relevant documents and by utilizing a RAG (Retrieval-Augmented Generation) pipeline, I can extract context from these documents to enrich the model's understanding and provide more meaningful responses.
Conclusion
Reflecting on my journey with Photosmyth, I'm truly amazed by the power of generative AI. It's been an exciting adventure exploring how AI can unleash creativity in such a simple yet profound way. By leveraging NVIDIA's advanced AI models, Photosmyth demonstrates the remarkable potential of generative AI in rapidly synthesizing photorealistic images from text prompts. This journey has been enlightening, and I'm eager to see how Photosmyth continues to inspire and evolve.
I hope you enjoyed reading my experience.
Connect with me on LinkedIn!
Hi, I am Prikshit Tekta. I am a full-stack software engineer.