With the ever-increasing awareness of brand culture, companies are keeping a closer eye on any threats that will harm their intellectual properties (IP) rights. IP infringement includes:
- Patent infringement
- Trademark infringement
- Design infringement
A Vector Similarity Search System for Trademarks
To build a vector similarity search system for trademarks, you need to go through the following steps:
- Prepare a massive dataset of logos. Likely, the system can use a dataset like this.
- Train an image feature extraction model using the dataset and data-driven models or AI algorithms.
- Convert logos into vectors using the trained model or algorithm in Step 2.
- Store the vectors and conduct vector similarity searches in Milvus, the open-source vector database.
When the system is built, your user only needs to upload an image of a logo, and then the system converts this new image into a new vector using the same AI model you trained. The system searches for similar vectors to the new vector in the Milvus database and returns the corresponding vector IDs. Ultimately, your user will be able to see all the results of similar logos to the one he or she has uploaded. The following screenshot is a demonstration of a vector similarity search system for trademarks. As you can see, the user uploaded the logo, the swoosh, of the sportswear brand Nike. The system returns all images that are similar to this logo.
In the following sections, let’s take a closer look at the two major steps in building a vector similarity search system for trademarks: using AI models for image feature extraction, and using Milvus for vector similarity search. In our case, we used VGG16, a convolutional neural network (CNN), to extract image features and convert them into embedding vectors.
Using VGG16 For Image Feature Extraction
The VGG16 model, as its name suggests, is a CNN with 16 layers. All VGG models, including VGG16 and VGG19, contain 5 VGG blocks, with one or more convolutional layers in each VGG block. And at the end of each block, a max-pooling layer is connected to reduce the size of the input image. The number of kernels is equivalent within each convolutional layer but doubles in each VGG block. Therefore, the number of kernels in the model grows from 64 in the first block to 512 in the fourth and fifth blocks. All the convolutional kernels are 3*3-sized while the pooling kernels are all 2*2-sized. This is conducive to preserving more information about the input image.
Therefore, VGG16 is a suitable model for image recognition of massive datasets in this case. You can use Python, Tensorflow, and Keras to train an image feature extraction model on the basis of VGG16.
Using Milvus For Vector Similarity Search
After using the VGG16 model to extract image features and convert logo images into embedding vectors, you need to search for similar vectors from a massive dataset.
Milvus is a cloud-native database featuring high scalability and elasticity. Also, as a database, it can ensure data consistency. For a trademark similarity search system like this, new data like the latest trademark registrations are uploaded to the system in real-time. And these newly uploaded data need to be available for search immediately. Therefore, this article adopts Milvus, the open-source vector database, to conduct a vector similarity search.