Cisco AI Infrastructure Power Workshop – 2 days (AI-I-PW2)

Artificial intelligence (AI) is a major focus in all the sectors of industry and government.  It is a rapidly evolving space with many advanced features that provide greater insight, knowledge and operational efficiencies in many areas of operation.

Many businesses have indicated that AI is a strategic objective, but few have advanced to implementation and use.

The aim of this session is to explore AI concepts and how Cisco is providing an AI ready Data Center.

Objectives

  • Provide an overview of how to design and size Cisco AI POD solution with a focus on inferencing
  • Overview of Extended GPU Operations and AI Inferencing
  • Overview of GPU Operations and functional role in AI

Target audience:

  • IT Architects and Designers
  • Presales SEs
  • Network Engineers
  • Server Administrators
  • AI Integrators

Pre-requisite skills:

  • Network Admin skills
  • Understanding of programming concepts
  • Conceptual understanding of VM and containers
  • Basic knowledge of Cisco UCS server environment
  • Basic Linux overview

Duration:

2 days

Module 1 – AI Revolution

  • Changing Landscape
  • AI Overview
  • Who will use AI
    • AI Use Cases
  • Location of AI services
  • Introduction to Cisco AI

Module 2 – AI System Component Overview

  • The Anatomy of an AI system
  • Cisco AI Building blocks

Module 3 – AI Apps and Models

  • AI Apps and Model Overview
  • AI challenges
  • AI Adaptation and Enhancement
  • Inferencing Process Overview

Module 4 – Cisco AI Partners

  • Cisco Ecosystem overview
  • NVIDIA Offerings
  • RedHat OpenShift Overview
  • Storage Partners
  • Opensource Community

Module 5– Cisco AI Hosts

  • Cisco UCS portfolio
  • Purpose built AI servers – UCS C8xx series
  • UCS-X series
  • UCSI C-series
  • Intersight overview

Module 6 – Container and Kubernetes Overview

  • Container overview
  • Kubernetes (K8) overview
  • Container networking

Module 7 – RedHat OpenShift Overview

  • RedHat OpenShift Overview
    • Nodes
  • Deployment and use
    • CLI- OC and YAML
    • GUI
    • Projects
    • Connectivity
    • Monitoring
  • Operators
    • Redhat Core operators
    • NVIDIA operators
  • Workload deployment
  • OpenShift AI overview
  • Jupyter Notebooks and code deployment overview

Module 8 – GPUs

  • Architecture and Functional Overview
    • Cores and processing
    • vRAM and memory bandwidth
  • UCS GPU options
  • GPU sharing
  • Host Software and drivers
  • Containers and GPUs

Module 9 – AI Host Networking

  • GPU Clustering
    • GPU to GPU connectivity
      • DMA
    • Intra Node GPU clustering
      • NVLInk
    • Inter Node GPU clustering
      • RDMA and GPU Direct
      • RoCE2
      • Delay and throughput
  • Host networking
    • Smart and Super NICS
    • DPU
  • Containers and AI networking
  • Host Software and drivers

Module 10 – AI System Scoping

  • Design and Scoping Challenges
  • Model demands
  • Review GPU selection criteria
  • Workload demands and impact
  • Measuring AI Resource Usage
  • GPU sizing tools
  • Host attributes relevant to GPU functionality

Module 11 – AI Networking Overview

  • AI networking Roles
    • Backend Network
  • AI QOS requirements
    • Converged Ethernet
    • IP QOS
  • Over subscription and load balancing

Labs

Lab 1 – Model Explore

  • Review sources of models
  • Explore model attributes and focus

Lab 2 – Accessing Housley labs

  • Connect to In-house AI lab environment

Lab 3 – OpenShift and Container deployment

  • Explore OpenShift functionality and GUI
  • Deploy containers
  • External connectivity
  • Explore operators
  • Jupyter Notebook primer

Lab 4 – Explore and use an AI environment

  • GPU monitoring
  • Using AI client – GUI Open WebUI
  • Create and query RAG database
  • Explore simple Python code for inferencing
  • Explore and querying NVIDIA NIM (Inferencing server)