# Import Model Registry Using vLLM

## Step 1: Access Model Registry

* Log in with your GreenNode account and navigate to the [Model Registry Dashboard](https://aiplatform.console.vngcloud.vn/registry).
* Find and click the "Import a model registry" button..

## Step 2: Configure Model Registry

* **Region & Model registry name**: Select the region and provide a specific name for your model.
* **Container**: Select the Pre-built container option to use supported frameworks.
* **Framework**: Choose the framework and appropriate version for deploying the model. In this guide, select vLLM 0.7.2.
* **Model Source**: Access the model from your Network Volume, AI Platform Catalog, or directly from Hugging Face.
* **Model Repository**: Select the Network Volume containing your Triton model. You need to prepare the [**model repository**](https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/tutorials/Conceptual_Guide/Part_1-model_deployment/README.html#setting-up-the-model-repository) with the following structure:
* ```
  # Example repository structure
  network-volume/
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      <version>/
        <model-definition-file>
      <version>/
        <model-definition-file>
      ...
    <model-name>/
      [config.pbtxt]
      [<output-labels-file> ...]
      <version>/
        <model-definition-file>
      <version>/
        <model-definition-file>
      ...
    ...
  ```
* **Cấu hình vLLM (vLLM Settings):**
  * **Served model name:** The model name used in the API.\
    Note: This name will also be used in the `model_name` tag
  * **Max number of sequences:** Maximum number of sequences per iteration. Default: 256.
  * **Max Context Length:** Maximum context length of the model. If not specified, the system will automatically use the value from the model configuration.
  * If Tool Call support is enabled:
    * Select the tool call parser type you need (e.g., hermes, mistral).
* Click the "Import" button to complete the process.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.vngcloud.vn/vng-cloud-document/ai-stack/ai-platform/model-registry/import-model-registry-using-vllm.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
