Customers across diverse industries are defining entirely new categories of products and experiences by running intelligent applications that use ML at the core. These applications are becoming more expensive to run in production. AWS Inferentia is a custom-built machine learning inference chip designed to provide high throughput, low latency inference performance at an extremely low cost. Each chip provides hundreds of TOPS of inference throughput to allow complex models to make fast predictions. Join this session to see the latest developments using AWS Inferentia and how they can lower your inference costs in the future.