Member-only story
Deploy Llama 3 on Amazon SageMaker
Earlier today Meta released Llama 3, the next iteration of the open-access Llama family. Llama 3 comes in two sizes: 8B for efficient deployment and development on consumer-size GPU, and 70B for large-scale AI native applications. Both come in base and instruction-tuned variants. In addition to the 4 models, a new version of Llama Guard was fine-tuned on Llama 3 8B and is released as Llama Guard 2 (safety fine-tune).
In this blog you will learn how to deploy meta-llama/Meta-Llama-3–70B-Instruct model to Amazon SageMaker. We are going to use the Hugging Face LLM DLC is a purpose-built Inference Container to easily deploy LLMs in a secure and managed environment. The DLC is powered by Text Generation Inference (TGI) a scalelable, optimized solution for deploying and serving Large Language Models (LLMs). The Blog post also includes Hardware requirements for the different model sizes.
In the blog will cover how to:
- Setup development environment
- Hardware requirements
- Deploy Llama 3 70b to Amazon SageMaker
- Test and chat with the model
- Benchmark llama 3 70B
- Clean up
Lets get started!