Self-hosted CLI

View as Markdown

This page provides documentation for the NVCF Self-hosted CLI, a command-line interface for managing NVIDIA Cloud Functions in self-hosted deployments.

Overview

The NVCF Self-hosted CLI provides:

  • Automatic Token Generation: Generate admin tokens and API keys via direct API calls
  • Smart State Management: Persistent workflow context eliminates manual ID copying
  • Multi-Environment Support: Separate configurations for dev/staging/production
  • gRPC Invocation: Native support for gRPC function invocation
  • Shell Completion: Autocompletion for bash, zsh, fish, and PowerShell

The CLI is available as a container image from NGC. See self-hosted-artifact-manifest for the full artifact path.

Prerequisites

  • Network access to NVCF API endpoints
  • NGC CLI installed (for downloading the CLI release from NGC)

Installation

Download from NGC

The CLI is available as a resource from NGC. See download-nvcf-cli for detailed download and extraction instructions.

The downloaded package includes:

  • nvcf-cli - The CLI binary
  • .nvcf-cli.yaml.template - Configuration template
  • examples/ - Sample configuration files
  • USAGE-GUIDE.md - Detailed usage documentation

Configuration

The CLI uses YAML configuration files. After extracting the CLI, copy the included template:

$cp .nvcf-cli.yaml.template .nvcf-cli.yaml

Configuration files are searched in this order:

  1. Explicit path via --config flag (highest priority)
  2. Current directory: ./.nvcf-cli.yaml
  3. Home directory: ~/.nvcf-cli.yaml

Place your .nvcf-cli.yaml in the directory where you run the CLI for project-specific configuration, or in your home directory for global configuration.

Self-Hosted Configuration

For self-hosted deployments, the CLI must be configured to communicate with your gateway. The gateway uses hostname-based routing for HTTP services.

For a complete understanding of how the gateway routes traffic, including architecture diagrams, verification commands, and production DNS/HTTPS setup, see gateway-routing.

Prepare Gateway API ingress

For one-click installs on a remote cluster, set up Gateway API ingress before running self-hosted up. The command installs the control plane, then calls the configured API, API Keys, invocation, and gRPC endpoints during health and cluster registration phases.

Complete Gateway quickstart before you configure the CLI. The shared Gateway quickstart installs the Gateway API CRDs, creates and labels the required namespaces, installs Envoy Gateway, creates the GatewayClass and Gateway, waits for the Gateway to be programmed, and exports:

$echo "$HTTP_GATEWAY_NAMESPACE/$HTTP_GATEWAY_NAME"
$echo "$GRPC_GATEWAY_NAMESPACE/$GRPC_GATEWAY_NAME"
$echo "$GATEWAY_ADDR"
$echo "$GRPC_GATEWAY_ADDR"

These are the same Gateway setup steps used by the one-click, Helmfile, and standalone install paths. Keep the exported values in your shell, then configure the CLI.

For test environments without production DNS, use the Gateway load balancer address as the stack domain:

$export STACK_DOMAIN="$GATEWAY_ADDR"

For production environments, set STACK_DOMAIN to the DNS name that your HTTPRoute hostnames use.

Configuring the CLI

Create your configuration file:

$# Copy the template
$cp .nvcf-cli.yaml.template .nvcf-cli.yaml

Complete self-hosted configuration:

1# ==============================================================================
2# API Endpoints - Point to your gateway load balancer
3# ==============================================================================
4
5# Main API endpoint (use http:// for non-TLS setups)
6base_http_url: "http://<GATEWAY_ADDR>"
7
8# Invocation endpoint (same as base_http_url for self-hosted)
9invoke_url: "http://<GATEWAY_ADDR>"
10
11# gRPC endpoint - uses dedicated TCP port (no Host header needed)
12base_grpc_url: "<GRPC_GATEWAY_ADDR>:10081"
13
14# API Keys service endpoint
15api_keys_service_url: "http://<GATEWAY_ADDR>"
16
17# ==============================================================================
18# Host Header Overrides (Required for Hostname-Based Routing)
19# ==============================================================================
20#
21# Because the gateway routes HTTP requests based on the Host header,
22# you must specify the correct Host header for each service.
23# These values must match the HTTPRoute hostnames rendered from STACK_DOMAIN.
24#
25# Without these, the gateway returns 404 because it can't match the route.
26
27# Host header for API Keys service
28api_keys_host: "api-keys.<STACK_DOMAIN>"
29
30# Host header for NVCF API (function management)
31api_host: "api.<STACK_DOMAIN>"
32
33# Host header for Invocation service
34invoke_host: "invocation.<STACK_DOMAIN>"
35
36# ==============================================================================
37# API Keys Service Configuration
38# ==============================================================================
39
40api_keys_service_id: "nvidia-cloud-functions-ncp-service-id-aketm"
41api_keys_issuer_service: "nvcf-api"
42api_keys_owner_id: "svc@nvcf-api.local"
43
44# ==============================================================================
45# Account Configuration
46# ==============================================================================
47
48client_id: "nvcf-default"
49
50# ==============================================================================
51# Debugging (optional - set to true for verbose output)
52# ==============================================================================
53
54debug: false

For test environments without production DNS, the URL fields and host fields can use the Gateway load balancer address:

1# Gateway load balancer address
2base_http_url: "http://a1b2c3d4e5f6.us-west-2.elb.amazonaws.com"
3invoke_url: "http://a1b2c3d4e5f6.us-west-2.elb.amazonaws.com"
4base_grpc_url: "a1b2c3d4e5f6.us-west-2.elb.amazonaws.com:10081"
5api_keys_service_url: "http://a1b2c3d4e5f6.us-west-2.elb.amazonaws.com"
6
7# Host headers matching HTTPRoute configuration
8api_keys_host: "api-keys.a1b2c3d4e5f6.us-west-2.elb.amazonaws.com"
9api_host: "api.a1b2c3d4e5f6.us-west-2.elb.amazonaws.com"
10invoke_host: "invocation.a1b2c3d4e5f6.us-west-2.elb.amazonaws.com"
11
12# API Keys service
13api_keys_service_id: "nvidia-cloud-functions-ncp-service-id-aketm"
14api_keys_issuer_service: "nvcf-api"
15api_keys_owner_id: "svc@nvcf-api.local"
16
17client_id: "nvcf-default"
18debug: true

Verifying Your Configuration

After configuring the CLI, verify connectivity:

$# Test token generation (uses api_keys_host)
$./nvcf-cli init
$
$# Expected output:
$# [INFO] Starting fresh session...
$# [INFO] Generating admin token from API Keys service...
$# [DEBUG] Using Host header override: api-keys.<your-gateway>
$# [SUCCESS] Admin token generated and saved

If you see a 404 error, verify:

  1. The api_keys_host value matches your HTTPRoute hostname
  2. The gateway load balancer is accessible
  3. The API Keys service is running: kubectl get pods -n api-keys

Why Host headers are needed: the Envoy Gateway uses hostname-based routing to direct traffic to different backend services through a single load balancer. Without the correct Host header, the gateway cannot match the request to a route and returns 404.

gRPC does not need Host headers because it uses a dedicated TCP listener on port 10081. The gateway routes all traffic on that port directly to the gRPC service without hostname matching.

Production Setup: DNS and HTTPS

The Host header configuration above is designed for testing and development. For production deployments, configure proper DNS and TLS to eliminate the need for Host header overrides.

With proper DNS and HTTPS configured:

  • DNS records resolve service hostnames directly to your Gateway’s load balancer
  • TLS certificates secure all traffic
  • The CLI uses simple URLs without Host header overrides
  • Browsers and other clients can access services directly
1# Simple URLs using your domain - no Host header overrides needed!
2base_http_url: "https://api.nvcf.example.com"
3invoke_url: "https://invocation.nvcf.example.com"
4base_grpc_url: "grpc.nvcf.example.com:443"
5api_keys_service_url: "https://api-keys.nvcf.example.com"
6
7# No host header overrides required - DNS handles routing
8# api_keys_host: "" # Not needed

For complete instructions on setting up DNS records and TLS certificates, see production-dns-https in the Gateway Routing guide.

Multi-Environment Setup

Use the --config flag to manage multiple environments with separate configuration files:

$# Development
$./nvcf-cli --config dev.yaml init
$./nvcf-cli --config dev.yaml function create --input-file function.json
$
$# Production
$./nvcf-cli --config prod.yaml init
$./nvcf-cli --config prod.yaml function list

Each configuration maintains separate state files (e.g., ~/.nvcf-cli.dev.state for dev.yaml).

Debug Mode

Enable debug mode for detailed logging by adding to your configuration file:

1debug: true

Or use the --debug flag or NVCF_DEBUG=true environment variable per-command.

Quick Start

$# 1. Initialize - generate admin token
$./nvcf-cli init
$
$# 2. Generate API key for invocations
$./nvcf-cli api-key generate
$
$# 3. Create a function, update example file with your image
$./nvcf-cli function create --input-file examples/create-function.json
$
$# 4. Deploy the function (uses saved context automatically)
$./nvcf-cli function deploy create
$
$# 5. Invoke the function
$./nvcf-cli function invoke --request-body '{"message": "hello world"}'
$
$# 6. Clean up
$./nvcf-cli function deploy remove
$./nvcf-cli function delete

For immediate testing, you can use load_tester_supreme from nvcf-onprem (see self-hosted-artifact-manifest), which supports the {"message": "hello world"} request body above. For more function samples, see the nv-cloud-function-helpers repository and function-creation for function creation documentation.

Authentication

The CLI supports two types of authentication tokens:

  • Admin Token (NVCF_TOKEN): For function management (create, deploy, update, delete)
  • API Key (NVCF_API_KEY): For user operations (invoke, list, queue status)

Generate Admin Token

$# Generate fresh admin token (clears existing state)
$./nvcf-cli init
$
$# With debug output
$./nvcf-cli init --debug
$
$# Example output:
$# [INFO] Starting fresh session...
$# [INFO] Generating admin token from API Keys service...
$# [SUCCESS] Admin token generated and saved
$# Token: <admin-token>
$# Expires: 2025-11-19 06:08:15

Refresh Admin Token

Refresh your token while preserving function context:

$# Refresh token (keeps current function state)
$./nvcf-cli refresh
$
$# Example output:
$# [SUCCESS] Admin token refreshed
$# Function ID: func-abc123 (preserved)

Generate API Key

$# Generate with defaults (24h expiration)
$./nvcf-cli api-key generate
$
$# Custom expiration and description
$./nvcf-cli api-key generate --expires-in 48h --description "Production key"
$
$# Generate with custom scopes
$./nvcf-cli api-key generate --scopes invoke_function,list_functions
$
$# Generate and validate
$./nvcf-cli api-key generate --validate

Available scopes for API keys (all included by default):

ScopeDescription
invoke_functionExecute deployed functions
list_functionsView available functions
list_functions_detailsView detailed function metadata
queue_detailsMonitor function execution queues
manage_registriesManage registry credentials

Command Reference

Self-hosted Deployment Commands

Use these commands to install and inspect self-hosted NVCF deployments. For the full fresh-install walkthrough, see Quickstart.

CommandDescription
self-hosted check --preCheck local tools and Kubernetes access before installation.
self-hosted check --allRun all currently available self-hosted checks. Use this with pod, route, and function smoke validation.
self-hosted up --cluster-name <cluster-name> --nca-id <nca-id> --region <region>Run the one-click fresh-install flow.
self-hosted statusShow a deployment health summary.
self-hosted install --control-planeRun the control-plane installation primitive.
self-hosted install --compute-plane --cluster-name <cluster-name>Run the compute-plane installation primitive for a registered GPU cluster.
self-hosted uninstall --compute-plane --cluster-name <cluster-name>Remove compute-plane components for the GPU cluster.
self-hosted uninstall --control-planeRemove control-plane components.

For separate control-plane and GPU clusters, pass both kube contexts:

$./nvcf-cli self-hosted up \
> --control-plane-context <control-plane-context> \
> --compute-plane-context <gpu-cluster-context> \
> --cluster-name <cluster-name> \
> --nca-id <nca-id> \
> --region <region>

For a single cluster, omit both context flags.

General Commands

CommandDescription
initGenerate admin token and start fresh session
refreshRefresh admin token while preserving function context
statusDisplay CLI state and configuration (use --show-tokens for full token output)
versionShow CLI version information
completionGenerate shell autocompletion scripts (supports bash, zsh, fish, powershell)

API Key Commands

CommandDescription
api-key generateGenerate a new API key for function operations
api-key listList all API keys
api-key showShow the current saved API key
api-key deleteDelete a specific API key (supports --force)
api-key revokeRevoke an API key (same as delete, supports --force)
api-key clearClear saved API key from state (supports --force)
api-key clear-allDelete all API keys for an owner (supports --force)

Function Management Commands

Create Function

$# Create from JSON file
$./nvcf-cli function create --input-file examples/create-function.json
$
$# Create with CLI flags
$./nvcf-cli function create \
> --name "my-function" \
> --image "nvcr.io/your-org/your-image:tag" \
> --inference-url "/predict" \
> --inference-port 8000 \
> --health-uri "/health" \
> --health-port 8000
$
$# Create with additional options
$./nvcf-cli function create \
> --name "my-function" \
> --image "nvcr.io/your-org/your-image:tag" \
> --inference-url "/predict" \
> --inference-port 8000 \
> --function-type STREAMING \
> --container-env "KEY1=value1" \
> --secrets "API_KEY=secret123" \
> --tags "production,v2" \
> --rate-limit "100-S"
$
$# Create an LLM function with model routing metadata
$./nvcf-cli function create \
> --name "my-llm-function" \
> --image "nvcr.io/example/openai-compatible:latest" \
> --inference-url "/" \
> --inference-port 8000 \
> --function-type LLM \
> --llm-model "name=dummy-model,uris=/v1/chat/completions|/v1/responses,routingMethod=round_robin,tokenRateLimit=1000-M"

All function create flags:

FlagDescription
--nameFunction name (required)
--imageContainer image (required)
--inference-urlInference endpoint URL (required)
--inference-portInference endpoint port (required)
--input-fileJSON file with function configuration
--descriptionFunction description
--function-typeDEFAULT, STREAMING, or LLM (default: DEFAULT)
--api-body-formatAPI body format (default: CUSTOM)
--health-uriHealth check endpoint URI
--health-portHealth check endpoint port
--health-protocolHealth protocol (HTTP or gRPC)
--health-timeoutHealth check timeout (ISO 8601 duration, e.g., PT30S)
--health-expected-statusExpected health check status code (default: 200)
--container-argsArguments for container launch
--container-envEnvironment variables in key=value format (repeatable)
--secretsSecrets in name=value format (repeatable)
--tagsComma-separated tags
--modelsModel artifacts in name:version:uri format (repeatable)
--llm-modelLLM model config in name=MODEL,uris=URI|URI,routingMethod=round_robin|power_of_two|random,tokenRateLimit=LIMIT format (repeatable)
--resourcesResource artifacts in name:version:uri format (repeatable)
--helm-chartHelm chart specification
--helm-chart-serviceHelm chart service name
--rate-limitRate limit pattern (e.g., 100-S, 50-M, 10-H, 5-D)
--rate-limit-exemptedNCA IDs exempted from rate limiting (repeatable)
--rate-limit-syncEnable synchronous rate limit checking

Example function JSON:

1{
2 "name": "my-inference-function",
3 "containerImage": "nvcr.io/your-org/your-image:tag",
4 "inferenceUrl": "/predict",
5 "inferencePort": 8000,
6 "health": {
7 "protocol": "HTTP",
8 "uri": "/health",
9 "port": 8000,
10 "timeout": "PT30S",
11 "expectedStatusCode": 200
12 }
13}

LLM functions use functionType: "LLM" and define model routing metadata under models[].llmConfig:

1{
2 "name": "sample-llm-function",
3 "containerImage": "nvcr.io/example/openai-compatible:latest",
4 "inferenceUrl": "/",
5 "inferencePort": 8000,
6 "functionType": "LLM",
7 "models": [
8 {
9 "name": "dummy-model",
10 "llmConfig": {
11 "uris": ["/v1/chat/completions", "/v1/responses"],
12 "routingMethod": "round_robin",
13 "tokenRateLimit": "1000-M"
14 }
15 }
16 ]
17}

For LLM models, llmConfig.routingMethod accepts round_robin, power_of_two, or random.

Deploy Function

The function deploy command group manages deployments with the following subcommands:

CommandDescription
function deploy createCreate a new deployment for a function
function deploy updateUpdate an existing deployment
function deploy getGet deployment details (supports --json for raw output)
function deploy removeRemove a function deployment
$# Deploy using saved context (from create)
$./nvcf-cli function deploy create
$
$# Deploy with explicit IDs and configuration
$./nvcf-cli function deploy create \
> --function-id <function-id> \
> --version-id <version-id> \
> --instance-type "NCP.GPU.A10G_1x" \
> --gpu "A10G" \
> --min-instances 1 \
> --max-instances 1
$
$# Deploy from JSON file
$./nvcf-cli function deploy create --input-file examples/deploy-function.json
$
$# View deployment details
$./nvcf-cli function deploy get \
> --function-id <function-id> \
> --version-id <version-id>
$
$# Update an existing deployment
$./nvcf-cli function deploy update \
> --function-id <function-id> \
> --version-id <version-id> \
> --gpu "A10G" \
> --instance-type "NCP.GPU.A10G_1x" \
> --min-instances 2 \
> --max-instances 4
$
$# Remove a deployment
$./nvcf-cli function deploy remove \
> --function-id <function-id> \
> --version-id <version-id>

Key function deploy create flags:

FlagDescription
--function-idFunction ID (uses state if not specified)
--version-idVersion ID (uses state if not specified)
--gpuGPU name (default: H100)
--instance-typeInstance type (default: NCP.GPU.H100_1x)
--min-instancesMinimum instances (default: 1)
--max-instancesMaximum instances (default: 1)
--max-request-concurrencyMax request concurrency (1-1024)
--input-fileJSON file with deployment configuration
--backendBackend/CSP for the GPU instance
--regionsAllowed deployment regions (repeatable)
--clustersSpecific clusters (repeatable)
--availability-zonesAvailability zones (repeatable)
--timeoutDeployment timeout in seconds (default: 900)
--storageAvailable storage (e.g., 80G)
--system-memoryAmount of RAM
--gpu-memoryAmount of GPU memory

Example deployment JSON:

1{
2 "deploymentSpecifications": [
3 {
4 "gpu": "A10G",
5 "instanceType": "NCP.GPU.A10G_1x",
6 "minInstances": 1,
7 "maxInstances": 1
8 }
9 ]
10}

List and Get Functions

$# List all functions
$./nvcf-cli function list
$
$# List function IDs only
$./nvcf-cli function list-ids
$
$# List versions of a specific function
$./nvcf-cli function list-versions <function-id>
$
$# Get details of a specific function version
$./nvcf-cli function get \
> --function-id <function-id> \
> --version-id <version-id>
$
$# Get details as raw JSON
$./nvcf-cli function get \
> --function-id <function-id> \
> --version-id <version-id> \
> --json

Update Function

$# Update function tags
$./nvcf-cli function update \
> --function-id <function-id> \
> --version-id <version-id> \
> --tags "production,v2"
$
$# Update from JSON file
$./nvcf-cli function update \
> --function-id <function-id> \
> --version-id <version-id> \
> --input-file metadata-update.json

Invoke Function

$# Invoke using saved context
$./nvcf-cli function invoke --request-body '{"input": "Hello, World!"}'
$
$# gRPC invocation
$./nvcf-cli function invoke --grpc --request-body '{"input": "test"}'
$
$# gRPC with custom service and method
$./nvcf-cli function invoke --grpc \
> --grpc-service "MyService" \
> --grpc-method "Predict" \
> --request-body '{"input": "test"}'
$
$# Invoke with explicit IDs and timeout
$./nvcf-cli function invoke \
> --function-id <function-id> \
> --version-id <version-id> \
> --request-body '{"input": "Hello!"}' \
> --timeout 120

LLM functions are invoked through the LLM invocation route and OpenAI-compatible paths:

$curl -sS -X POST "https://llm.invocation.<domain>/v1/chat/completions" \
> -H "Authorization: Bearer ${NVCF_API_KEY}" \
> -H "Content-Type: application/json" \
> -d '{"model":"<function-id>/dummy-model","stream":true,"messages":[{"role":"user","content":"Hello"}]}'

Use the OpenAI model value <function-id>/<model-name> for LLM invocation requests.

Additional function invoke flags:

FlagDescription
--grpcUse gRPC invocation
--grpc-servicegRPC service name
--grpc-methodgRPC method name
--grpc-plaintextUse plaintext (insecure) gRPC
--timeoutRequest timeout in seconds (default: 60)
--poll-durationInitial polling duration in seconds (default: 5)
--poll-ratePolling rate in seconds (default: 3)
--input-fileJSON file with invocation configuration
--input-asset-referencesInput asset references (repeatable)

Queue Management

$# Get queue status for a function
$./nvcf-cli function queue status <function-id> <version-id>
$
$# Get position for a specific request
$./nvcf-cli function queue position <request-id>

Delete Function

$# Delete current function from state
$./nvcf-cli function delete
$
$# Delete specific function
$./nvcf-cli function delete --function-id <func-id> --version-id <ver-id>
$
$# Delete deployment only (keep function definition)
$./nvcf-cli function delete --deployment-only --graceful

Registry Credentials Commands

Manage container registry credentials for function images and Helm charts. For comprehensive setup instructions including IAM configuration for AWS ECR, see third-party-registries-self-hosted.

CommandDescription
registry-credential addAdd a new registry credential
registry-credential listList all registry credentials
registry-credential getGet details of a specific credential
registry-credential updateUpdate an existing credential
registry-credential deleteDelete a registry credential
registry-credential list-recognizedList all recognized registries
$# Add registry credentials using base64 secret
$./nvcf-cli registry-credential add \
> --hostname "nvcr.io" \
> --secret "<BASE64_ENCODED_USERNAME:PASSWORD>" \
> --artifact-type CONTAINER \
> --description "NGC Container Registry"
$
$# Add registry credentials using username/password
$./nvcf-cli registry-credential add \
> --hostname "nvcr.io" \
> --username "<USERNAME>" \
> --password "<PASSWORD>" \
> --artifact-type CONTAINER
$
$# List registry credentials (with optional filters)
$./nvcf-cli registry-credential list
$./nvcf-cli registry-credential list --artifact-type CONTAINER
$./nvcf-cli registry-credential list --provisioned-by USER
$
$# Get details for a specific credential
$./nvcf-cli registry-credential get <credential-id>
$
$# Update a credential
$./nvcf-cli registry-credential update <credential-id> \
> --username "<NEW_USERNAME>" \
> --password "<NEW_PASSWORD>"
$
$# Delete registry credentials
$./nvcf-cli registry-credential delete <credential-id>
$./nvcf-cli registry-credential delete <credential-id> --force
$
$# List recognized registries
$./nvcf-cli registry-credential list-recognized

Troubleshooting

Authentication Errors

401 Unauthorized on function creation:

$# Regenerate admin token
$./nvcf-cli init --debug
$
$# Verify token is being used
$./nvcf-cli function create --debug --input-file function.json
$# Look for: "Using FUNCTION TOKEN for POST"

403 Forbidden on invocation:

$# Regenerate API key
$./nvcf-cli api-key generate --validate
$
$# Verify API key is being used
$./nvcf-cli function invoke --debug --request-body '{"input": "test"}'
$# Look for: "Using API KEY for POST"

Token Expiration

$# Check token status
$./nvcf-cli status
$
$# Refresh admin token
$./nvcf-cli refresh
$
$# Regenerate API key
$./nvcf-cli api-key generate

Connection Issues

$# Enable debug mode to see request details
$./nvcf-cli function list --debug
$
$# Verify CLI state and configuration
$./nvcf-cli status

Token Usage Summary

OperationRequired TokenNotes
function createNVCF_TOKENAdmin token required
function deploy createNVCF_TOKENFalls back to API key
function deleteNVCF_TOKENNo fallback - admin only
function invokeNVCF_API_KEYFalls back to admin token
function listNVCF_API_KEYFalls back to admin token

For additional troubleshooting, see self-hosted-troubleshooting.