AI Fundamentals and Cisco AI-Ready Infrastructure (AI-PW)
-
Overview
Description, Pre requisites -
Content
Lessons, Course Structure
The focus of this Artificial Intelligence (AI) Power Workshop training is to enable Cisco engineers, who have an infrastructure background, to understand the environment upon which AI systems are built.
The aim is not to explore AI programming or model development. However sufficient detail will be provided so as to understand the requirements of AI system and how to provision relevant infrastructure.
The focus will be on provisioning AI using On-Prem environment with a focus on the Cisco AI strategy.
The areas that will be of focus include:
- AI overview and key concepts
- Server components such as GPU
- Networking and inter device communications requirements
- Common AI environments and tools
- Containerization and bare metal software provisioning
- Cisco AI offerings and frameworks
- Associated partner integrations. This will include RedHat and NVIDIA.
The sessions will be augmented with lab exercises to enhance and reinforce learning objectives
Target audience:
- IT Architects
- Presales SEs
- Network Engineers
- Server Administrators
Pre-requisite skills:
- Network Admin skills
- Basic Linux operations
- Understanding of programming logic
- Conceptual understanding of VM and containers
- Basic knowledge of Cisco UCS server environment
Duration:
5 days
Module 1 – AI Overview
Module overview
- What it covers
- AI infra engineering
- Build environment up to point where,
- Data Engineers can apply their models
- An off the shelf AI environment can be deployed
- What is not covered
- Customer Business AI requirements
- ML and building AI logic
Overview of AI
- History and evolution
- Hype vs reality
- AI vs intelligence
- The AI landscape: Narrow AI, General AI, Super AI.
- Future long way off
- AI and pattern recognition
- AI type overview
- AI catch all
- ML
- Deep learning
- Supervised and unsupervised
- Learning and knowledge
- Garbage in Garbage out
- Bias
- Hallucinations
Use case
- Industry uses – who is using
- Real-world examples: healthcare, finance, transportation, entertainment, etc.
- Use cases
- Network analysis
- Predictive
- Root cause
- Log analysis
- Security
- Cisco Hypersecure
- Network analysis
- Fraud detection
- Cars
- chatbots
- etc
Impact of AI
- Social and jobs
- Work and efficiency
Where are we at now
- Why recent boom
- Hardware
- Transforms
- Code / software availability
- Is Market ready?
- % of business say they are ready
- What do they want
- Next steps and future
- Where is Cisco now
AI operations
- Patterns and predictions
- AI Stages
- Preprocess and validate data
- Training
- Post work
- Validating and adjusting
- Reinforcing
- other
- Inference
- ML algorithms
- Normal calculations using Cuda
- Deep learning
- Neural network overview
- Neurons, layers, activation functions.
- Tensors
- Models
- Convolutional Neural Networks (CNNs)
- Recurrent Neural Networks (RNNs)
- Transforms
- Natural Language Processing (NLP)
- LLM and text
- Word guess
- Text preprocessing: tokenization, stemming, lemmatization.
- Computer Vision
- Image processing fundamentals.
- Feature extraction: edge detection, corner detection.
- Hybrid models
Model overview
- What is a model
- Numbers and naming
- Parameters
- Model Size
- Data set size
- Layers
- FLOPs
- computational cost of a model
- Granularity and accuracy
- Tree structure and weightings
- Model types
- GPT
- LLAMA
- DALLE
- Etc
- Common Models
- ChatGPT – OpenAI
- Llama – Meta
- Claude – Anthropic
- Gemini – Google
- Watson – IBM
- Qwen – Alibaba
- Deepseek
- Learning resources
- Publicly Model sources
- GIT
- Ollama
- Hugging Face
- NVIDIA NGC
- {Add URLS}
- Training data sources
- Internet
- Opensource data sets
- In house
Lab
- Connect to NVIDIA site and look over the different types of AI
- Look at Various Repos and what models they have
Module 2 – AI Deployments
AI Deployment Options
- Deployment Options overview
- Cloud vs on prem
- hybrid
- Turnkey/bundle vs DIY
- Cloud vs on prem
- Cloud
- Google Cloud AI
- Google Colab
- AWS AI
- Amazon SageMaker Studio
- Azure AI
- IBM
- Other key players?
- Google Cloud AI
- On-premises design overview
- Deployment options and big picture
- How it all hands together
- blueprints
- Bear metal
- Virtualization and containers
- Deployment options and big picture
On prem Infrastructure components
- Compute nodes
- Generic
- Cisco models (more later)
- OS
- Drivers
- PCIe
- Operation
- speeds
- CPU vs GPU vs TPU vs DPU vs NPU
- CPU
- Intel Xeon
- 5th Gen with 4th Gen
- AMX and AVS-512
- PyTorch (IPEX)
- DeepSpeed
- multi-CPU inferencing
- Intel Xeon
- AMD
- Embedded Inference capabilities
- GPU operations
- Architecture
- vRAM and cache
- cores
- CUDA
- Tensor
- Parallelisation
- CPU to GPU comms
- Control channels
- Architecture
- GPU cards and models
- Nvidia
- AMD
- Intel
- Performance metrics and requirements
- Stats and GPU requirements
- GPU to work task expectations
- CPU
- RAM and vRAM
- IO
- Stats and GPU requirements
- Sharing Single GPU
- MIG
- vGPU
- time slicing
- Networking
- NIC and offload
- DPU
- Speeds
- 400-800G
- throughput/bandwidth
- Stats
- MTU
- Switches and fabrics
- Latency, QOS and loss/delay
- Converged Ethernet – PFC
- TOS
- NIC and offload
- InfiniBand positioning
- Storage and databases
- Size requirements
- Performance considerations
- Connectivity options
- Local
- IP/NAS
- FC
Containerization and Virtualization
- Ready-made containers vs BareMetal
- Container primer
- Container fundamentals
- Runtimes
- Supporting software
- Container fundamentals
- Podman and docker
- Desktop and cli
- Options and how setup
- Images vs containers
- Image sources/repos
- Config files
- Kubernetes overview
- {Base as very deep – enough for openshift}
- {Flow TBD}
- AI and Containers
- GPUs and Containers
- VMs and AI
- Direct io
- VMware
Building an AI environment
- Host setup
- Tools, drivers and software frameworks
- Nvidia-smi
- other
- Tools, drivers and software frameworks
- How run the AI
- python
- Opensource tools
- Ollama
- Podman AI
- Human interaction
- Open web ui
- Other
- N8N – https://n8n.io/
- API interaction
- Access to and usability
- Keys
- Tools and orchestrators
- Ansible
- etc
Lab
- Connect Housley labs
- Connect to vms/host with GPU
- Build container AI environment
- Explore build environments in a podman environment
- External and internal to container
- Run various explore options with different models etc
Module 3 – AI Enhancements and Tuning
RAG
- How works
- Vector DBs
Retraining
- How works
- Pros and cons
AI Customization
- Prompt engineering
- Pipelines/workflows
- Filtering and privacy
- Search Watch and update
- Scan WWW
- Agents
- Tools
Scaling AI
- AI GPU clusters
- What used for
- Distribute workload over multiple GPUs
- How works
- RDMA
- SRIOV
- Transports
- RoCE
- Infiniband
- NVlink
- Controls and
- What used for
Lab
- Add docs to base model to create RAG enhanced env
- Explore Prompt engineering and filtering
- Look at Workflows and enhanced AI tools
- Possibly on www
- Explore preconfigured Clustered GPU environment
Module 4 – Cisco AI
Cisco AI approach
- Roadmap and vision
- CVD –options and overview
- AI stack – Cisco and partners
- Hyperfabric AI pod
Cisco Equipment
- Networking
- Hyperfabric
- Nexus/NDFC
- ACI
- UCS
- C885 and c845
- Architecture
- UCS X
- Intersights and
- Other servers
- C885 and c845
Partners
- Ecosystem overview
- Nvidia overview
- Redhat overview
- Storage
- Pure
- Databases
NVIDIA AI
- NVAIE overview
- Local apps
- DC GPU manager
Redhat, Openshift and AI
- Overview
- RHEL and AI
- Openshift overview and architecture
- AI platform
- extensions
- Operators
Ongoing ownership
- Monitoring and Troubleshooting
- Tools
Lab
- Explore
- lab backend
- Hyperfabric
- Intersights
- NVAIE
- Openshift
- Monitoring tools and
Module 5 – Customer Home grown AI
Development environments overview
- Development environment overview
- Python Libraries overview
- TensorFlow, PyTorch, NumPy, Pandas, Scikit-learn
- Jupyter Notebook
- NVAIE workbench
- overview
- Tools
- Openshift workbench
- overview
- Tools
Lab
- Make sample model in NVAIE workbench and Jupyter Notebook