Help Center›Models›Deploying Models with Inference API

Deploying Models with Inference API

Last updated July 1, 2024

Introduction: The Hugging Face Inference API allows you to deploy models quickly and efficiently, making them accessible for real-time predictions via API calls.

Steps:

Setting Up Your Model on Hugging Face Hub Upload Your Model: Ensure your model is uploaded to the Hugging Face Hub. You can follow the Hugging Face documentation for detailed steps on uploading.
Accessing the Inference API API Endpoint: Obtain your API endpoint from the Hugging Face model page. It usually looks like https://api-inference.huggingface.co/models/{username}/{model_name}.
Authenticating Your API Requests

API Token: Generate an API token from your Hugging Face account settings.
Include Token in Requests: import requests API_URL = "https://api-inference.huggingface.co/models/{username}/{model_name}" headers = {"Authorization": f"Bearer {your_api_token}"}

Making Predictions Send Data for Inference defquery(payload): response = requests.post(API_URL, headers=headers, json=payload) return response.json() data = {"inputs": "Your input text here"} result = query(data) print(result)
Handling API Responses

Interpreting Results: Process the response to extract and use the predicted outputs for your application.
Example Response Handling: python Copy code print(f"Prediction: {result[0]['label']} with score {result[0]['score']}")

Monitoring and Scaling

Monitor Usage: Use the Hugging Face dashboard to monitor your API usage and performance.
Scaling Options: Consider upgrading your plan or optimizing your model to handle increased load efficiently.

Was this article helpful?

Related articles

Deploying Models with Inference API