2023 年 8 月 9 日

如何使用第二索引實作文件加強 (Document Boosting)

了解如何使用 Meilisearch，透過第二個索引來實作置頂文件的優先搜尋結果。

Laurent CazanoveDX 工程師 & 文案@StriftCodes

How to implement document boosting with a second index

在本指南中，我們將引導您透過 Meilisearch 實作優先搜尋結果。我們的目標是在特定關鍵字與使用者查詢匹配時，優先顯示特定文件。這些加強的文件應顯示在搜尋結果的頂部。

本指南說明如何在後端實作優先文件。如需前端優先的實作方式，請參閱使用 React InstantSearch 實作優先搜尋結果

概觀

以下是使用第二個索引來針對「置頂文件」實作文件加強，並使用多重搜尋功能的簡化分解。

建立索引：設定兩個索引：一個用於一般搜尋，另一個用於加強結果。加強索引將有一個特殊的屬性 keywords 來觸發加強。
填入「games」索引：使用提供的 JSON 檔案，以您的資料集填入 games 索引。此索引將作為我們的加強文件的來源。
設定「pinned_games」索引：設定 pinned_games 索引以顯示屬性，而不顯示關鍵字。相應地調整可搜尋和顯示的屬性。
加強文件：找出您想要加強的文件，並為其分配相關的關鍵字。例如，您可以將關鍵字 fps 和 shooter 分配給遊戲 Counter-Strike。
實作多重搜尋：利用 Meilisearch 的多重搜尋功能，在一般索引和加強索引中執行搜尋查詢。如此一來，匹配關鍵字的加強文件將首先顯示。
顯示結果：以使用者友好的格式呈現搜尋結果，並使用視覺指標強調加強的文件。

實作

安裝

在開始之前，請確定您的 Meilisearch 已啟動並執行。如果您尚未安裝，請按照下列步驟進行

啟動 Meilisearch 執行個體 — 您可以在本機執行 Meilisearch或透過Meilisearch Cloud執行。
確保您已安裝您慣用的語言 SDK（或架構整合）。

本指南使用Python SDK，但它與任何其他 Meilisearch 整合的工作方式相同。🎉

初始化索引

在我們的範例中，我們將使用 Steam 遊戲的資料集。您可以根據自己的資料調整此流程

下載我們的Steam 遊戲資料集的 steam-games.json 和 settings.json 檔案
透過新增文件從 steam-games.json 檔案中，將資料集載入您的 Meilisearch 執行個體中。

`games` 索引

import meilisearch
import json
from typing import Callable

client = meilisearch.Client(url="https://127.0.0.1:7700")
games = client.index("games")

# helper to wait for Meilisearch tasks
def wait_with_progress(client: meilisearch.Client, task_uid: int):
    while True:
        try:
            client.wait_for_task(task_uid, timeout_in_ms=1000)
            break
        except meilisearch.errors.MeilisearchTimeoutError:
            print(".", end="")
    task = client.get_task(task_uid)
    print(f" {task.status}")
    if task.error is not None:
        print(f"{task.error}")
            
print("Adding settings...", end="")
with open("settings.json") as settings_file:
    settings = json.load(settings_file)
    task = games.update_settings(settings)
    wait_with_progress(client, task.task_uid)


with open("steam-games.json") as documents_file:
    documents = json.load(documents_file)
    task = games.add_documents_json(documents)
    print("Adding documents...", end="")
    wait_with_progress(client, task.task_uid)

`pinned_games` 索引

此索引將包含優先顯示的文件。pinned_games 索引的設定與 games 索引相同，但以下差異除外

唯一的 searchableAttributes 是 keywords 屬性，其中包含觸發置頂該文件的文字。
displayedAttributes 是文件的所有屬性，keywords 除外（我們不希望向最終使用者顯示關鍵字）

pinned = client.index("pinned_games")

print("Adding settings...", end="")
with open("settings.json") as settings_file:
    settings = json.load(settings_file)
    settings["searchableAttributes"] = ["keywords"]
    # all but "keywords"
    settings["displayedAttributes"] = ["name", "description", "id", "price", "image", "releaseDate", "recommendationCount", "platforms", "players", "genres", "misc"]
    task = pinned.update_settings(settings)
    # see `wait_with_progress` implementation in previous code sample
    wait_with_progress(client, task.task_uid)

更新優先文件索引

現在，我們將從 games 索引中，將我們想要優先顯示的文件填入索引。

舉例來說，假設我們想要將遊戲 "Counter-Strike" 置頂到 "fps" 和 "first", "person", "shooter" 關鍵字。

counter_strike = games.get_document(document_id=10)
counter_strike.keywords = ["fps", "first", "person", "shooter"]

print("Adding pinned document...", end="")
task = pinned.add_documents(dict(counter_strike))
wait_with_progress(client, task.task_uid)

自訂搜尋結果

現在，讓我們建立一個函式，以傳回包含置頂文件的搜尋結果。

from copy import deepcopy
from typing import Any, Dict, List
from dataclasses import dataclass

@dataclass
class SearchResults:
    pinned: List[Dict[str, Any]]
    regular: List[Dict[str, Any]]

def search_with_pinned(client: meilisearch.Client, query: Dict[str, Any]) -> SearchResults:
    pinned_query = deepcopy(query)
    pinned_query["indexUid"] = "pinned_games"
    regular_query = deepcopy(query)
    regular_query["indexUid"] = "games"
    results = client.multi_search([pinned_query, regular_query])
    # fetch the limit that was passed to each query so that we can respect that value when getting the results from each source
    limit = results["results"][0]["limit"]
    # fetch as many results from the pinned source as possible
    pinned_results = results["results"][0]["hits"]
    # only fetch results from the regular source up to limit
    regular_results = results["results"][1]["hits"][:(limit-len(pinned_results))]
    return SearchResults(pinned=pinned_results, regular=regular_results)

我們可以使用此函式來檢索包含優先文件的搜尋結果

results = search_with_pinned(client, {"q": "first person shoot", "attributesToRetrieve": ["name"]})

results 物件應該會類似如下

SearchResults(pinned=[{'name': 'Counter-Strike'}], regular=[{'name': 'Rogue Shooter: The FPS Roguelike'}, {'name': 'Rocket Shooter'}, {'name': 'Masked Shooters 2'}, {'name': 'Alpha Decay'}, {'name': 'Red Trigger'}, {'name': 'RAGE'}, {'name': 'BRINK'}, {'name': 'Voice of Pripyat'}, {'name': 'HAWKEN'}, {'name': 'Ziggurat'}, {'name': 'Dirty Bomb'}, {'name': 'Gunscape'}, {'name': 'Descent: Underground'}, {'name': 'Putrefaction'}, {'name': 'Killing Room'}, {'name': 'Hard Reset Redux'}, {'name': 'Bunny Hop League'}, {'name': 'Kimulator : Fight for your destiny'}, {'name': 'Intrude'}])