Whether you are working on a recommendation engine, a search engine, or any other application that involves understanding and comparing pieces of text, it’s likely you’ll need to convert that text into a numerical form. This process, known as vectorization, allows us to apply mathematical techniques to analyze and compare our text.
In this blog post, we will use the OpenAI API to generate vectors from the text content of any given web page. Then, we’ll store these vectors in a Redis database for fast retrieval and comparison later.

Setup
Before we start, make sure you have the following Python packages installed:
beautifulsoup4for parsing HTML and extracting the content we want.openaifor generating vectors from text.redisfor connecting to our Redis database.numpyfor handling the vectors.
You will also need to sign up for an OpenAI account and get an API key at OpenAI Platform.
Create a Index in Redis using redis-cli
FT.CREATE posts ON HASH PREFIX 1 "post:" SCHEMA url TEXT
This command creates an index called “posts” on all Redis hash objects with keys that start with “post:”. It adds one field to the index, “url”, which is a text field.
You would run this command in the redis-cli by connecting to your Redis server and then entering the command at the prompt.
Fetching the Content
We’ll be using the Beautiful Soup library in Python to fetch the content of a web page:
from bs4 import BeautifulSoup
import requests
def fetch_content(url):
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
content = soup.find('div', {'class': 'entry-content'}).text
return content
In this code, we’re fetching the web page at the given URL and parsing it with Beautiful Soup. Then we’re extracting the text within a <div> tag with the class ‘entry-content’. This is where the main content of a blog post typically resides.
Vectorizing the Content
Next, we’ll use the OpenAI API to generate a vector from the content:
import openai
import numpy as np
openai.api_key = os.getenv('OPENAI_API_KEY')
def generate_vector(text):
embedding = openai.Embedding.create(input=text, model="text-embedding-ada-002")
vector = embedding["data"][0]["embedding"]
vector = np.array(vector).astype(np.float32).tobytes()
return vector
Here, we’re sending our text to the OpenAI API and getting back a vector. This vector represents the semantic content of our text in a form that can be compared mathematically with other vectors.
Storing the Vector
Finally, we’ll store the vector in a Redis database:
import redis
conn = redis.Redis(host='127.0.0.1', port=6379)
def store_vector(url, vector):
post_hash = {
"url": url,
"embedding": vector
}
conn.hmset("post:" + url, post_hash)
In this code, we’re connecting to our Redis database and storing the vector as a hash. We use the URL of the web page as the key, so we can easily retrieve the vector later using the URL.
Putting It All Together
Now that we have all the pieces, we can create a function that fetches a web page, generates a vector from its content, and stores the vector in Redis:
def vectorize_url(url):
content = fetch_content(url)
vector = generate_vector(content)
store_vector(url, vector)
And that’s it! We can now vectorize the content of any web page and store the vector for later use. For example, we could call vectorize_url('https://example.com/some-blog-post') to vectorize a specific blog post.
Final Code
Just change the link in the code to your preference. Create a new project folder. Copy paste this code, name the file something like anylinkredis.py and save to your new project foler. Next create a .env in the same directory and add your OpenAI API Key and Redis information.
from bs4 import BeautifulSoup
import openai
import redis
import numpy as np
import json
import os
import requests
# OpenAI API key
openai.api_key = os.getenv('OPENAI_API_KEY')
redis_host = os.getenv('REDIS_HOST')
redis_port = os.getenv('REDIS_PORT')
redis_password = os.getenv('REDIS_PASSWORD')
# Connect to the Redis server
conn = redis.Redis(host='127.0.0.1', port=6379)
def vectorize_url(post_url):
# Fetch the blog post
r = requests.get(post_url)
post_soup = BeautifulSoup(r.text, 'html.parser')
post_text = post_soup.find('div', {'class': 'entry-content'}).text
print(f"Fetched blog post from {post_url}")
# Generate the vector
embedding = openai.Embedding.create(
input=post_text,
model="text-embedding-ada-002"
)
vector = embedding["data"][0]["embedding"]
vector = np.array(vector).astype(np.float32).tobytes() # Serialize the vector
print(f"Generated vector for {post_url}")
# Store in Redis
post_hash = {
"url": post_url,
"embedding": vector
}
for key, value in post_hash.items():
conn.hset("post:" + post_url, key, value)
print(f"Stored vector in Redis for {post_url}")
# Test the function with a specific URL
vectorize_url('https://innerinetcompany.com/2023/07/09/use-openai-to-vectorize-content-and-store-in-redis-python-code-within/')
Example for .env
OPENAI_API_KEY=your_openai_api_key
REDIS_HOST=127.0.0.1
REDIS_PORT=6379
REDIS_PASSWORD=
Conclusion
This simple code allows us to leverage the power of OpenAI’s text embeddings and Redis’s fast data retrieval to create a system that can understand and compare pieces of text in an efficient way. This opens up a world of possibilities for recommendation engines, search engines, and other applications that need to understand and compare pieces of text.
Vectorize .txt files and store to Redis
A simple example where we vectorize text content from .txt files using the OpenAI API and store the result in a Redis database. Note that this won’t work for other file types (e.g., images, PDFs, Word documents), but it’s a start:
import redis
import openai
import numpy as np
import os
# OpenAI API key
openai.api_key = os.getenv('OPENAI_API_KEY')
redis_host = os.getenv('REDIS_HOST')
redis_port = os.getenv('REDIS_PORT')
redis_password = os.getenv('REDIS_PASSWORD')
# Connect to the Redis server
conn = redis.Redis(host='127.0.0.1', port=6379)
def vectorize_file(file_path):
# Open the file and read the content
with open(file_path, 'r') as file:
content = file.read()
# Generate the vector
embedding = openai.Embedding.create(
input=content,
model="text-embedding-ada-002"
)
vector = embedding["data"][0]["embedding"]
vector = np.array(vector).astype(np.float32).tobytes() # Serialize the vector
print("Vector has been generated.")
# Store in Redis
file_hash = {
"path": file_path,
"embedding": vector
}
for key, value in file_hash.items():
conn.hset("file:" + file_path, key, value)
print("Vector has been stored in Redis.")
# Test the function with a specific file
vectorize_file('/path/to/your/file.txt')
Change this BOLD code to your text file path, /path/to/your/file.txt
In this script, vectorize_file is a function that takes a file path as an argument. It opens the file, reads the content, generates a vector from the content using the OpenAI API, and stores the vector in a Redis database.
You can call vectorize_file with the path to any text file to vectorize its content and store the vector in Redis. For example, you could call vectorize_file('/path/to/your/file.txt') to vectorize a specific file.
prompt engineered with OpenAI Code Interpreter.
Embeddings? Vectorize-to-Redis
command for redis-cli in RedisInsight
FT.SEARCH posts https
The above cmd searches my “posts” index in RedisInsight and finds 9 post in the “https” folder, and then outputs the embedding for each post.
1) "9"
2) "post:https://innerinetcompany.com/2018/01/23/first-blog-post/"
3) 1) "embedding"
2) "the actual very long output of embeddings for the post"

Perform Redis Vector Queries
use redis-cli in RedisInsight, the cmd format is;
FT.SEARCH yourIndexName “word or words to query”
FT.SEARCH posts "innerinetcompany.com"
outputs the number of results, the url, and embeddings
1) "11"
2) "post:https://innerinetcompany.com/2018/01/23/first-blog-post/"
3) 1) "embedding"
2) "\xe5\x02_<b\xa7\xf1\xb ...
GEOADD
In redis-cli, add geospatial items to your index like (longitude, latitude, name)
GEOADD yourindex:locations 86.9250 27.9867 "Mount Everest"
or you can make locations the index itself like
GEOADD locations 86.9250 27.9867 "Mount Everest"
then search locations or coordinates like
GEOPOS yourindex:locations "Mount Everest"
or
GEOPOS locations "Mount Everest"
output is
> GEOPOS locations "Mount Everest"
1) 1) "86.92500025033950806"
2) "27.98669911504683938"
learn more here at Redis.io
