Skip to main content

Create Structured Index

Structured indexes in Marqo are tailored for datasets with a defined schema and are particularly effective for complex queries like sorting, grouping, and filtering. They are designed for fast, in-memory operations.


To create your structured index:

POST /indexes/{index_name}

Create index with (optional) settings. This endpoint accepts the application/json content type.

Path parameters

NameTypeDescription
index_name
StringName of the index

Body Parameters

The settings for the index are represented as a nested JSON object that contains the default settings for the index. The parameters are as follows:

NameTypeDefault valueDescription
allFieldsList-List of fields that might be indexed or queried. Valid only if type is structured
tensorFieldsList[]List of fields that are treated as tensors
modelStringhf/e5-base-v2The model to use to vectorise doc content in add_documents() calls for the index
modelPropertiesDictionary""The model properties object corresponding to model (for custom models)
normalizeEmbeddings
BooleantrueNormalize the embeddings to have unit length
textPreprocessingDictionary""The text preprocessing object
imagePreprocessingDictionary""The image preprocessing object
videoPreprocessingDictionary""The video preprocessing object
audioPreprocessingDictionary""The audio preprocessing object
annParametersDictionary""The ANN algorithm parameter object
typeStringunstructuredType of the index. The default value is unstructured, but for the structured index this needs to be structured
vectorNumericTypeStringfloatNumeric type for vector encoding

Note: these body parameters are used in both Marqo Open-Source and Marqo Cloud. Marqo Cloud also has additional body parameters. Let's take a look at those now.

Additional Marqo Cloud Body Parameters

Marqo Cloud creates dedicated infrastructure for each index. Using the create index endpoint, you can specify the type of storage for the index storageClass and the type of inference inferenceType. The number of storage instances is defined by numberOfShards, the number of replicas numberOfReplicas and the number of Marqo inference nodes by numberOfInferences. This is only supported for Marqo Cloud, not Marqo Open-Source.

NameTypeDefault valueDescriptionOpen SourceCloud
inferenceTypeStringmarqo.CPU.smallType of inference for the index. Options are "marqo.CPU.small"(deprecated), "marqo.CPU.large", "marqo.GPU".
storageClassStringmarqo.basicType of storage for the index. Options are "marqo.basic", "marqo.balanced.storage", "marqo.balanced.throughput", "marqo.performance".
numberOfShardsInteger1The number of shards for the index.
numberOfReplicasInteger0The number of replicas for the index.
numberOfInferencesInteger1The number of inference nodes for the index.

Fields

The allFields object contains the fields that might be indexed or queried. Each field has the following parameters:

NameTypeDefault valueDescription
nameString-Name of the field
typeString-Type of the field
featuresList[]List of features that the field supports

Available types are:

Field TypeDescriptionSupported Features
textText fieldlexical_search, filter
int32-bit integerfilter, score_modifier
float32-bit floatfilter, score_modifier
long64-bit integerfilter, score_modifier
double64-bit floatfilter, score_modifier
array<text>Array of textlexical_search, filter
array<int>Array of 32-bit integersfilter
array<float>Array of 32-bit floatsfilter
array<long>Array of 64-bit integersfilter
array<double>Array of 64-bit floatsfilter
boolBooleanfilter
multimodal_combinationMultimodal combination fieldNone
image_pointerImage URL. Must only be used with a multimodal model such as CLIPNone
video_pointerVideo URL. Must only be used with a multimodal model such as LanguageBindNone
audio_pointerAudio URL. Must only be used with a multimodal model such as LanguageBindNone
custom_vectorCustom vector, with optional text for lexical/filteringlexical_search, filter
map<text, int>Map of text to integersscore_modifier
map<text, long>Map of text to longsscore_modifier
map<text, float>Map of text to floatsscore_modifier
map<text, double>Map of text to doublesscore_modifier

Available features are:

  • lexical_search: The field can be used for lexical search
  • filter: The field can be used for exact and range (numerical fields) filtering
  • score_modifier: The field can be used to modify the score of the document

When using multimodal_combination fields, the dependentFields object is used to define the weights for the multimodal combination field and is required. The dependentFields object is a dictionary where the keys are the names of the fields that are used to create the multimodal combination field and the values are the weights for each field. Field names must refer to fields that are defined in allFields. See the example below for more details.

Text Preprocessing Object

The textPreprocessing object contains the specifics of how you want the index to preprocess text. The parameters are as follows:

NameTypeDefault valueDescription
splitLengthInteger2The length of the chunks after splitting by split_method
splitOverlap
Integer0The length of overlap between adjacent chunks
splitMethodStringsentenceThe method by which text is chunked (character, word, sentence, or passage)

Image Preprocessing Object

The imagePreprocessing object contains the specifics of how you want the index to preprocess images. The parameters are as follows:

NameTypeDefault valueDescription
patchMethod
StringnullThe method by which images are chunked (simple or frcnn)

Video Preprocessing Object

The videoPreprocessing object contains the specifics of how you want the index to preprocess videos. The last chunk in the video file will have a start time of the total length of the video file minus the split length.

The parameters are as follows:

NameTypeDefault valueDescription
splitLengthInteger20The length of the video chunks in seconds after splitting by split_method
splitOverlap
Integer3The length of overlap in seconds between adjacent chunks

Audio Preprocessing Object

The audioPreprocessing object contains the specifics of how you want the index to preprocess audio. The last chunk in the audio file will have a start time of the total length of the audio file minus the split length.

The parameters are as follows:

NameTypeDefault valueDescription
splitLengthInteger10The length of the audio chunks in seconds after splitting by split_method
splitOverlap
Integer3The length of overlap in seconds between adjacent chunks

ANN Algorithm Parameter object

The annParameters object contains hyperparameters for the approximate nearest neighbour algorithm used for tensor storage within Marqo. The parameters are as follows:

NameTypeDefault valueDescription
spaceType
Stringprenormalized-angularThe function used to measure the distance between two points in ANN (angular, euclidean, dotproduct, geodegrees, hamming, or prenormalized-angular).
parametersDict""The hyperparameters for the ANN method (which is always hnsw for Marqo).

HNSW Method Parameters Object

parameters can have the following values:

NameTypeDefault valueDescription
efConstruction
int512The size of the dynamic list used during k-NN graph creation. Higher values lead to a more accurate graph but slower indexing speed. It is recommended to keep this between 2 and 800 (maximum is 4096)
mint16The number of bidirectional links that the plugin creates for each new element. Increasing and decreasing this value can have a large impact on memory consumption. Keep this value between 2 and 100.

Model Properties Object

This flexible object, used by modelProperties is used to set up models that aren't available in Marqo by default ( models available by default are listed here). The structure of this object will vary depending on the model.

For OpenCLIP models, see here for modelProperties format and example usage.

For Generic SBERT models, see here for modelProperties format and example usage.

Example 1: Creating a structured index for combining text and images

=== "Marqo Open-Source" === "cURL"

curl -X POST 'http://localhost:8882/indexes/my-first-structured-index' \
-H "Content-Type: application/json" \
-d '{
"type": "structured",
"vectorNumericType": "float",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"normalizeEmbeddings": true,
"textPreprocessing": {
"splitLength": 2,
"splitOverlap": 0,
"splitMethod": "sentence"
},
"allFields": [
{"name": "text_field", "type": "text", "features": ["lexical_search"]},
{"name": "caption", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "tags", "type": "array<text>", "features": ["filter"]},
{"name": "image_field", "type": "image_pointer"},
{"name": "my_int", "type": "int", "features": ["score_modifier"]},
{
"name": "multimodal_field",
"type": "multimodal_combination",
"dependentFields": {"image_field": 0.9, "text_field": 0.1}
}
],
"tensorFields": ["multimodal_field"],
"annParameters": {
"spaceType": "prenormalized-angular",
"parameters": {"efConstruction": 512, "m": 16}
}
}'

=== "python"

import marqo

settings = {
"type": "structured",
"vectorNumericType": "float",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"normalizeEmbeddings": True,
"textPreprocessing": {
"splitLength": 2,
"splitOverlap": 0,
"splitMethod": "sentence",
},
"imagePreprocessing": {"patchMethod": None},
"allFields": [
{"name": "text_field", "type": "text", "features": ["lexical_search"]},
{"name": "caption", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "tags", "type": "array<text>", "features": ["filter"]},
{"name": "image_field", "type": "image_pointer"},
{"name": "my_int", "type": "int", "features": ["score_modifier"]},
# this field maps the above image field and text fields into a multimodal combination.
{
"name": "multimodal_field",
"type": "multimodal_combination",
"dependentFields": {"image_field": 0.9, "text_field": 0.1},
},
],
"tensorFields": ["multimodal_field"],
"annParameters": {
"spaceType": "prenormalized-angular",
"parameters": {"efConstruction": 512, "m": 16},
},
}

mq = marqo.Client(url="http://localhost:8882", api_key=None)

mq.create_index("my-first-structured-index", settings_dict=settings)

=== "Marqo Cloud"

=== "cURL"

curl -X POST 'https://api.marqo.ai/api/v2/indexes/my-first-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"type": "structured",
"vectorNumericType": "float",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"normalizeEmbeddings": true,
"textPreprocessing": {
"splitLength": 2,
"splitOverlap": 0,
"splitMethod": "sentence"
},
"allFields": [
{"name": "text_field", "type": "text", "features": ["lexical_search"]},
{"name": "caption", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "tags", "type": "array<text>", "features": ["filter"]},
{"name": "image_field", "type": "image_pointer"},
{"name": "my_int", "type": "int", "features": ["score_modifier"]},
{
"name": "multimodal_field",
"type": "multimodal_combination",
"dependentFields": {"image_field": 0.9, "text_field": 0.1}
}
],
"tensorFields": ["multimodal_field"],
"annParameters": {
"spaceType": "prenormalized-angular",
"parameters": {"efConstruction": 512, "m": 16}
},
"numberOfShards": 1,
"numberOfReplicas": 0,
"inferenceType": "marqo.CPU.large",
"storageClass": "marqo.basic",
"numberOfInferences": 1
}'

=== "python"

import marqo

settings = {
"type": "structured",
"vectorNumericType": "float",
"model": "open_clip/ViT-B-32/laion2b_s34b_b79k",
"normalizeEmbeddings": True,
"textPreprocessing": {
"splitLength": 2,
"splitOverlap": 0,
"splitMethod": "sentence",
},
"imagePreprocessing": {"patchMethod": None},
"allFields": [
{"name": "text_field", "type": "text", "features": ["lexical_search"]},
{"name": "caption", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "tags", "type": "array<text>", "features": ["filter"]},
{"name": "image_field", "type": "image_pointer"},
{"name": "my_int", "type": "int", "features": ["score_modifier"]},
# this field maps the above image field and text fields into a multimodal combination.
{
"name": "multimodal_field",
"type": "multimodal_combination",
"dependentFields": {"image_field": 0.9, "text_field": 0.1},
},
],
"tensorFields": ["multimodal_field"],
"annParameters": {
"spaceType": "prenormalized-angular",
"parameters": {"efConstruction": 512, "m": 16},
},
"numberOfShards": 1,
"numberOfReplicas": 0,
"inferenceType": "marqo.CPU.large",
"storageClass": "marqo.basic",
"numberOfInferences": 1,
}

mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")

mq.create_index("my-first-structured-index", settings_dict=settings)

Example 2: Creating a structured index with no model for use with custom vectors

=== "Marqo Open-Source"

=== "cURL"

curl -X POST 'http://localhost:8882/indexes/my-hybrid-index' \
-H "Content-Type: application/json" \
-d '{
"model": "no_model",
"modelProperties": {
"type": "no_model",
"dimensions": 3072
},
"type": "structured",
"allFields": [
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "int", "features": ["score_modifier"]}
],
"tensorFields": ["title"]
}'

=== "python"

import marqo

mq = marqo.Client("http://localhost:8882", api_key=None)

mq.create_index(
index_name="my-hybrid-index",
type="structured",
model="no_model",
model_properties={"type": "no_model", "dimensions": 3072},
all_fields=[
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{
"name": "description",
"type": "text",
"features": ["lexical_search", "filter"],
},
{"name": "time_added_epoch", "type": "float", "features": ["score_modifier"]},
],
tensor_fields=["title"],
)

=== "Marqo Cloud"

=== "cURL"

curl -X POST 'https://api.marqo.ai/api/v2/indexes/my-first-index' \
-H 'x-api-key: XXXXXXXXXXXXXXX' \
-H "Content-Type: application/json" \
-d '{
"model": "no_model",
"modelProperties": {
"type": "no_model",
"dimensions": 3072
},
"type": "structured",
"allFields": [
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{"name": "description", "type": "text", "features": ["lexical_search", "filter"]},
{"name": "time_added_epoch", "type": "int", "features": ["score_modifier"]}
],
"tensorFields": ["title"],
"numberOfShards": 1,
"numberOfReplicas": 0,
"inferenceType": "marqo.CPU.large",
"storageClass": "marqo.basic",
"numberOfInferences": 1
}'

=== "python"

import marqo

mq = marqo.Client("https://api.marqo.ai", api_key="XXXXXXXXXXXXXXX")

mq.create_index(
index_name="my-hybrid-index",
type="structured",
model="no_model",
model_properties={"type": "no_model", "dimensions": 3072},
all_fields=[
{"name": "title", "type": "custom_vector", "features": ["lexical_search"]},
{
"name": "description",
"type": "text",
"features": ["lexical_search", "filter"],
},
{"name": "time_added_epoch", "type": "float", "features": ["score_modifier"]},
],
tensor_fields=["title"],
)