Cisco AI Infrastructure Power Workshop – 3 days (AI-I-PW3)

Overview
Description, Pre requisites
Content
Lessons, Course Structure

Artificial intelligence (AI) is a major focus in all the sectors of industry and government. It is a rapidly evolving space with many advanced features that provide greater insight, knowledge and operational efficiencies in many areas of operation.

Many businesses have indicated that AI is a strategic objective, but few have advanced to implementation and use. The aim of this session is to enable attendees to design, size, deploy and configure a Cisco Centric AI Infrastructure Solution. Attendees are assumed to be familiar with fundamentals of AI technology and concepts.

Objectives

Provide an overview of how to design and size Cisco AI POD solution
Overview of Extended GPU Operations and AI Inferencing
Technologies that form the AI landscape and how Cisco infrastructure and ecosystem interact
Methodology framework to migrate an AI Cloud solution to on-prem

Target audience:

IT Architects and Designers
Presales SEs
Network Engineers
Server Administrators
AI Integrators

Pre-requisite skills:

Network Admin skills
Understanding of programming concepts
Conceptual understanding of VM and containers
Basic knowledge of Cisco UCS server environment
Basic Linux overview
Fundamentals of AI

Duration:

3 days

Module 1 – AI System Overview

Overview of AI solution
- Cisco AI POD overview
- Components – HW and SW

Module 2 – Cisco AI Pod Networking

NDFC and NXOS config
QOS and CLI output
Infra networks
- Front End
- Backend

Host Networking
- SuperNIC configuration
- DOCA

Module 3 – AI storage

Server disks
- Boot
- Openshift Data …
- Cepht

What is on disks
- Image
- Container volumes
- Data for training

Shared storage
- Object storage
Repos – local in OpenShift
K8 storage
- Persistent volumes

Module 4 – Extended GPU Operations

GPU clustering
Backend network connectivity
ROCE2
- RDMA
- IP protocol headers
Supporting software
NVLINK integration

Module 5– AI Inferencing Advanced

Inference server operations
Inference frameworks
VLLM operations
Nvidia NIMS

Module 6 – How to Manage OpenShift

Operators for NVIDIA and distributed GPUs
Network config for Backend

Module 7 – Cloud Migration to On-prem

Design points
- Reverse engineer solution on prem
- AI workloads requirements
- Infra requirements

Scoping
- Sizing details
- How get metrics and quantify
- Cloud provider scoping and sizing guides

Migration strategies and framework
- Quantitative
- Metrics

Cisco AI Infrastructure Power Workshop – 3 days (AI-I-PW3)

Overview

Content