Join Michael Yuan, CEO of @SecondStateInc, as he explores lightweight large language model (LLM) inference with WebAssembly (WASM).
In this video, Michael demonstrates how to run full-scale LLMs like LLaMA on various platforms, from personal laptops to cloud servers, with the efficiency of WASM. He addresses the challenges of running LLMs in cloud environments, offers practical demos, and discusses future applications.
Learn more about Civo Navigate here -►https://www.civo.com/navigate
====================
Get free credit to try the world’s first K3s-powered, managed Kubernetes service.
Sign up to Civo -► https://www.civo.com
Get started with Kubernetes with Civo Academy: https://www.civo.com/academy
Subscribe to our YouTube Channel -► http://civo.io/subscribe
Follow Civo:
• Twitter -► / civocloud
• Github -► https://github.com/civo
• LinkedIn -► / civocloud
• Facebook -► / civocloud
0:00 - Introduction and Background
2:00 - Challenges of Running Large Language Models
6:00 - Importance of Portability in AI Workloads
9:00 - Introduction to WebAssembly (WASM)
11:30 - Benefits of WASM for AI Inference
15:00 - Live Demo Setup
18:00 - Running WASM Locally
20:30 - Deploying WASM on Cloud with NVIDIA
23:00 - Practical Use Cases and Applications