MariaDB Vector Edition: Designed for AI

Tutorials

MariaDB

As a solutions architect with over two decades of experience in relational database systems, I recently started exploring MariaDB’s new Vector Edition to see if it could address some of the AI data challenges we’re facing. A quick look seemed pretty convincing, especially with how it could bring AI magic right into a regular database setup. However, I wanted to test it with a simple use case to see how it performs in practice.

In this article, I will share my hands-on experience and observations about MariaDB’s vector capabilities by running a simple use case. Specifically, I will be loading sample customer reviews into MariaDB and performing fast similarity searches to find related reviews.

Environment Setup

My experiment started with setting up a Docker container using MariaDB’s latest release (11.6) which includes vector capabilities.

Shell

# Pull the latest release

docker pull quay.io/mariadb-foundation/mariadb-devel:11.6-vector-preview

​

# Update password

docker run -d --name mariadb_vector -e MYSQL_ROOT_PASSWORD=<replace_password> quay.io/mariadb-foundation/mariadb-devel:11.6-vector-preview

​

Now, create a table and load it with sample customer reviews that include sentiment scores and embeddings for each review. To generate text embeddings, I am using SentenceTransformer, which lets you use pre-trained models. To be specific, I decided to go with a model called paraphrase-MiniLM-L6-v2 that takes our customer reviews and maps them into a 384-dimensional space.

Python

import mysql.connector

import numpy as np

from sentence_transformers import SentenceTransformer

​

model = SentenceTransformer('paraphrase-MiniLM-L6-v2')

​

# I already have a database created with a name vectordb

connection = mysql.connector.connect(

        host="localhost",

        user="root",

        password="<password>", # Replace me

        database="vectordb"

    )

cursor = connection.cursor()

​

# Create a table to store customer reviews with sentiment score and embeddings.

cursor.execute("""

    CREATE TABLE IF NOT EXISTS customer_reviews (

            id INT PRIMARY KEY AUTO_INCREMENT,

            product_name INT,

            customer_review TEXT,

            customer_sentiment_score FLOAT,

            customer_review_embedding BLOB,

            INDEX vector_idx (customer_review_embedding) USING HNSW

    ) ENGINE=ColumnStore;

    """)

​

# Sample reviews

reviews = [

        (1, "This product exceeded my expectations. Highly recommended!", 0.9),

        (1, "Decent quality, but pricey.", 0.6),

        (2, "Terrible experience. The product does not work.", 0.1),

        (2, "Average product, ok ok", 0.5),

        (3, "Absolutely love it! Best purchase I have made this year.", 1.0)

    ]

​

# Load sample reviews into vector DB

for product_id, review_text, sentiment_score in reviews:

    embedding = model.encode(review_text)

    cursor.execute(

        "INSERT INTO customer_reviews (product_id, review_text, sentiment_score, review_embedding) VALUES (%s, %s, %s, %s)",

        (product_id, review_text, sentiment_score, embedding.tobytes()))

​

connection.commit()

connection.close()

Now, let’s leverage MariaDB’s vector capabilities to find similar reviews. This is more like asking “What other customers said similar to this review?“. In the below example, I am going to find the top 2 reviews that are similar to a customer review that says “I am super satisfied!“. To do this, I am using one of the vector functions (VEC_Distance_Euclidean) available in the latest release.

Python

# Convert the target customer review into vector

target_review_embedding = model.encode("I am super satisfied!")

​

# Find top 2 similar reviews using MariaDB's VEC_Distance_Euclidean function

cursor.execute("""

        SELECT review_text, sentiment_score, VEC_Distance_Euclidean(review_embedding, %s) AS similarity

        FROM customer_reviews

        ORDER BY similarity

        LIMIT %s

    """, (target_review_embedding.tobytes(), 2))

​

similar_reviews = cursor.fetchall()

​

Observations

It is easy to set up and we can combine both structured data (like product IDs and sentiment scores), unstructured data (review text), and their vector representations in a single table.
I like its ability to use SQL syntax alongside vector operations which makes it easy for teams that are already familiar with relational databases. Here is the full list of vector functions supported in this release.
The HNSW index improved the performance of the similarity search query for larger datasets that I tried so far.

Conclusion

Overall, I am impressed! MariaDB’s Vector Edition is going to simplify certain AI-driven architectures. It bridges the gap between the traditional database world and the evolving demands of AI tools. In the coming months, I look forward to seeing how this technology matures and how the community adopts it in real-world applications.

Source:
https://dzone.com/articles/mariadb-vector-edition-hands-on-review