Cisco AI Infrastructure Power Workshop – 2 days (AI-I-PW2)
-
Overview
Description, Pre requisites -
Content
Lessons, Course Structure -
Register
Date, location, Participant(s) details
Artificial intelligence (AI) is a major focus in all the sectors of industry and government. It is a rapidly evolving space with many advanced features that provide greater insight, knowledge and operational efficiencies in many areas of operation.
Many businesses have indicated that AI is a strategic objective, but few have advanced to implementation and use.
The aim of this session is to explore AI concepts and how Cisco is providing an AI ready Data Center.
Objectives
- Provide an overview of how to design and size Cisco AI POD solution with a focus on inferencing
- Overview of Extended GPU Operations and AI Inferencing
- Overview of GPU Operations and functional role in AI
Target audience:
- IT Architects and Designers
- Presales SEs
- Network Engineers
- Server Administrators
- AI Integrators
Pre-requisite skills:
- Network Admin skills
- Understanding of programming concepts
- Conceptual understanding of VM and containers
- Basic knowledge of Cisco UCS server environment
- Basic Linux overview
Duration:
2 days
Module 1 – AI Revolution
- Changing Landscape
- AI Overview
- Who will use AI
- AI Use Cases
- Location of AI services
- Introduction to Cisco AI
Module 2 – AI System Component Overview
- The Anatomy of an AI system
- Cisco AI Building blocks
Module 3 – AI Apps and Models
- AI Apps and Model Overview
- AI challenges
- AI Adaptation and Enhancement
- Inferencing Process Overview
Module 4 – Cisco AI Partners
- Cisco Ecosystem overview
- NVIDIA Offerings
- RedHat OpenShift Overview
- Storage Partners
- Opensource Community
Module 5– Cisco AI Hosts
- Cisco UCS portfolio
- Purpose built AI servers – UCS C8xx series
- UCS-X series
- UCSI C-series
- Intersight overview
Module 6 – Container and Kubernetes Overview
- Container overview
- Kubernetes (K8) overview
- Container networking
Module 7 – RedHat OpenShift Overview
- RedHat OpenShift Overview
- Nodes
- Deployment and use
- CLI- OC and YAML
- GUI
- Projects
- Connectivity
- Monitoring
- Operators
- Redhat Core operators
- NVIDIA operators
- Workload deployment
- OpenShift AI overview
- Jupyter Notebooks and code deployment overview
Module 8 – GPUs
- Architecture and Functional Overview
- Cores and processing
- vRAM and memory bandwidth
- UCS GPU options
- GPU sharing
- Host Software and drivers
- Containers and GPUs
Module 9 – AI Host Networking
- GPU Clustering
- GPU to GPU connectivity
- DMA
- Intra Node GPU clustering
- NVLInk
- Inter Node GPU clustering
- RDMA and GPU Direct
- RoCE2
- Delay and throughput
- GPU to GPU connectivity
- Host networking
- Smart and Super NICS
- DPU
- Containers and AI networking
- Host Software and drivers
Module 10 – AI System Scoping
- Design and Scoping Challenges
- Model demands
- Review GPU selection criteria
- Workload demands and impact
- Measuring AI Resource Usage
- GPU sizing tools
- Host attributes relevant to GPU functionality
Module 11 – AI Networking Overview
- AI networking Roles
- Backend Network
- AI QOS requirements
- Converged Ethernet
- IP QOS
- Over subscription and load balancing
Labs
Lab 1 – Model Explore
- Review sources of models
- Explore model attributes and focus
Lab 2 – Accessing Housley labs
- Connect to In-house AI lab environment
Lab 3 – OpenShift and Container deployment
- Explore OpenShift functionality and GUI
- Deploy containers
- External connectivity
- Explore operators
- Jupyter Notebook primer
Lab 4 – Explore and use an AI environment
- GPU monitoring
- Using AI client – GUI Open WebUI
- Create and query RAG database
- Explore simple Python code for inferencing
- Explore and querying NVIDIA NIM (Inferencing server)