Deploy Llama 3 on Amazon SageMaker

Amber Ivanna Trujillo
8 min readApr 21, 2024

Earlier today Meta released Llama 3, the next iteration of the open-access Llama family. Llama 3 comes in two sizes: 8B for efficient deployment and development on consumer-size GPU, and 70B for large-scale AI native applications. Both come in base and instruction-tuned variants. In addition to the 4 models, a new version of Llama Guard was fine-tuned on Llama 3 8B and is released as Llama Guard 2 (safety fine-tune).

In this blog you will learn how to deploy meta-llama/Meta-Llama-3–70B-Instruct model to Amazon SageMaker. We are going to use the Hugging Face LLM DLC is a purpose-built Inference Container to easily deploy LLMs in a secure and managed environment. The DLC is powered by Text Generation Inference (TGI) a scalelable, optimized solution for deploying and serving Large Language Models (LLMs). The Blog post also includes Hardware requirements for the different model sizes.

In the blog will cover how to:

  1. Setup development environment
  2. Hardware requirements
  3. Deploy Llama 3 70b to Amazon SageMaker
  4. Test and chat with the model
  5. Benchmark llama 3 70B
  6. Clean up

Lets get started!

1. Setup development environment

We are going to use the sagemaker python SDK to deploy LLAMA3 to Amazon SageMaker. We need to make sure to…

--

--

Amber Ivanna Trujillo

I am Executive Data Science Manager. Interested in Deep Learning, LLM, Startup, AI-Influencer, Technical stuff, Interviews and much more!!!