Oracle AI Foundations

The AI stack consists of: Generative AI, Deep Learning, Machine Learning, and Artificial Intelligence

Artificial Intelligence: Programming machines to imitate human intelligence
Machine Learning: Subset of AI where alogorithms are used to learn from past data and predict outcomes on new data or identify trends
Deep Learning: A subset of machine learning where algorithms are modelled to learn from complex data using neural networks.
Generative AI: A type of AI that creates new content

AI Foundations

Artificial General Intelligence: Is machines being able to replicate human intelligence capabilities like motor skills, learning, and intelligence.
When you apply AGI to specific and narrow objectives then you get Artificial Intelligence
2 major reasons why we need AI:
Automation & Decision Making
Creative Support
Commonly Used AI Domains:
Language: Text-related AI tasks use text as the input. Generative AI tasks the output text is generated by a model (ChatGPT).
- Text as Data: Inherently Sequential = Sentences, multiple words = tokenization, varying sentence legnths = padding, and similair words = dot or cosine similarity and embedding.
- Language AI Models = designed to understand, process, and generate natural language. (NLP)
- Deep Learning Models that are used for NLP are: Recurrent Neural Networks which process data sequentiall and stores hidden state, Long Short-Term Memory which process data sequentially and can retain the context betther through use of gates, and Transformers which process data in parallel by using concepts of self attention to better understand the context.
Audio & Speech: Can be either Audio-Related or Generative AI.
- Audio & Speech as Data: Digitized snapshots in time like a sample rate, sampling rate of 44.1kHz, bit depth is the number of bits in each 44.1kHz of data.
- Audio & Speech AI Models = designed to process and manipulate audio and speech.
- Deep Learning models are: Recurrent Neural Networks, ==Long Short-Term Memory, and Transformers, Variational Autoencoders, Waveform Models, & Siamese Networks
Vision: Can be Image Related or Generative AI.
- Image as Data: Images consist of pixels which can be grey scale or colour.
- Vision AI models = designed to process and understand visual information from images and videos
- Deep Learning Models: Convolutional Neural Networks which detect patterns in images, learning hierarchial representations of visual features and YOLO which process the image and detects objects within the image, and Generative Adversarial Network which generates real-looking images.

OCI AI Services

Vision AI Services allows us to do the following:
Image Classification: Upload an image which gets analyzed and labelled with confidence scores.
Object Detection: Upload the image and then it detects objects with confidence scores.
Text Detection: upload an image and it extracts all the text from the image.
Document AI: Upload a document and then it gives you the raw text and then assigns key value pairs and it extracts tables.
Language AI Services:
Text Analytics: Analyzes a block of text and provides us language detection, text classfication, extracts entities, key phrase extractions and sentiment analysis. Also personal identifiable information
Text Translation: Translates text from one language to another.

AI vs ML vs DL

Machine Learning Types:
Supervised: Extracting rules from labelled data. For example: Like credit card applications that use a rules engine. Learning from labelled data.
Unsupervised: Extracting trends from unlabelled data. Grouping similair data into clusters like retail marketing and sales.
Reinforcement: Solving tasks by trial and error.
Deep Learning is used extracting features and rules from data and it uses neural networks with multiple layers.

Machine Learning Foundations

ML provides statidstical tools to analyze, visualize, and make predictions from data like Netflix movie suggestions.
ML uses input features to describe what the output label should be. Train the model with the input features and then when the model is trained we can apply inference which is the ability to predic the label.
Types of Machine Learning:
Supervised which uses labeled data, unsupervised where we just understand relationships and reinforcement which make decisions.
Supervised examples include: disease detection, weather forecasting, stock price prediction, spam detection
Unsupervised examples: Fradulent transactions, outlier detection and targeted marketing campaigns
Reinforcement: automated robots, autonomous cars, healthcare and video games

Supervised Learning: Classification

There are 2 types of output labels:
Continous - This leads to Regression
Categorical - This leads to Classification and these can be binary or multi-class (different types of the same thing like 3 different species of a flower)
Classification = a supervised mL technique used to categorize or assign data points into predefined classes based on their features or attributes.
They train using a labelled data set

Machine Learning Algorithm for classfication is: Logistic Regression: helps in predicting if something is true or false. Logistic regression uses an S-sjaped curve (sigmoid) for the data as opposed to linear regression.

Supervised Learning: Regression

Independent features are the labeled input and the dependent feature is the output label
Uses linear regression
loss is a number indicating how far the predicted value is from the actual value

Jupyter Notebooks

Anaconda is an open source python and R for data science and machine learning. It helps with package management and deployment. Within Anaconda is Jupyter Notebooks which is an IDE that allows you to share documents.
This opens up a terminal view on localhost of your files.
Machine Level Process consists of:
Loading Data
Preprocessing - this involves creating features and labels
Training a Model
Evaluating the Model
Making Predictions
Important ML Libraries in Python from Sklearn
train_test_split - this module is used to split the data into 2 sets one where you can train the model and other set to test
StandardScalar - Process of transforming data so it has a mean of 0 and a standard deviation of 1 and it make sures all features use the same scale. if. square foot vs number of beds when predicting the price of a house.
accuracy_score- gives you an prediction of how strong the prediction is (classification)
Unsupervised Learning:
There are no labelled outputs
Algorithms learn the patterns in the data and group similair data items together.
Clustering: is the grouping of simialir data items
Similarity: is how close two data points are to each other and is a value between 0 and 1 and this determines what cluster objects belong too.
Unsupervised Workflow:
1. Prepare the data (remove missing values and normalize)
2. Create similarity metrics
3. Run the clustering algorithm (parition, density, hierarchial and distribution)
4. Interpret results and adjust clusternig
Reinforcement Learning:
type of machine learning that enables an agent to learn from its interactions within the enviornment.
agent = interacts with the enviornment and takes action and learns from feedback
enviornment = external systems with which the agent interacts
state = representation of the current situation of the enviornment
action = possible moves or decisions that the agent can take
policy = mapping that the agent uses to devide which action to take
optimal policy = finding the policy that yields a lot of rewards. The algorithms used are: Q learning or Deep Q learning

Deep Learning Foundations

A subset of machine learning that focuses on training Artificial Neural Networks (ANNs) with multiple layers
ML needs us to specify features wheres in Deep Learning extracts features from raw and complex data and DL algorithms allow parallel processing of data so it has better scalability and performance.
The use of GPUs were needed for this complex learning and machine algorithms
Types of Deep Learning algorithms can be broken down into two types: Data (Images, videos, text, audio) and the applications (image classification, face detection, NLP)
Deep Learning Alorithm for images is Convolutional Neural Networks (CNN)
For text we use Transformers, Long-Short Term Memory (LSTM) or Recurrent Neural Networks (RNN)
For images, audio and text generation we can use: Transformers, Difussion models, and Generative Adversarial networks (GAN)
Building Blocks of ANN:
Layers which are inputs and hidden layers
Neurons are computantional units that accept input and produce an output and applies ther activation function to generate output
Weights which determine the strength of connection between neurons
Activation Function work on the weighted sum of inputs to a neuron to produce an output
Bias: is additional input to a neuron that allows a certain degree of flexibility
ANNs are trained using Backpropgation Algorithm which is guessing and comparing and then measuring the error. Then the weights are adjusted and then weights are updated. Repeat and Learng Model training method
Deep Learning Models for Sequence Models
Sequence models are input data in the form of sequences and the goal is to find patterns and make predictions. Like NLP, speech recognition, gesture recognition, etc

-Recurrent Neural Networks (RNN) handle sequential data and there is a feedback loop and it can maintain a hidden state or memory and it updates as each element in the sequence is processed. So it can capture dependencies -Architecture: 1. One to One: used for non sequential data 2. One to Many: music generation or sequence generation 3. Many to one: sentiment analysis 4. Many to Many: machine translation and entity recognition.

-==**Long Short Term Memory**==: Works by using a specialized memory cell and gating mechanisms to capture long term dependencies which RNN is not good at. 
- Input processing at step 1, then it recieves the previous memory hidden state value then there is a gating mechanism (input gate, forget gate, and output gate) then it updates the memory and then it produced output generation.

-==**Convolutional Neural Networks**==: 
   -Deep Learning Models: 
       1. `Feed Forward Neural Networks (FNN)` - multi layer perception MLP (simplest)
       2. `Convolutional Neural Networks (CNN)` - good for image and video 
       3. `Recurrent Neural Networks (RNN)` - good  for time series and sequential data
       4. `Autoencoders` - are unsupervised learning models used for feature extraction.
       5. `Long Short Term Memory` - specilized RNN for long term dependencies
       6. Generative Adversarial Network` (GAN) producing images and content 
       7. `Transformers` which is used for language processing

    - CNN: processes grid like data like images and videos. CNN works good with 2D data by reducing images into an easier to process form.

    - CNN Layers:
       1. Input Layer 
       2. Feature Extraction layers: This layer is to automatically learn and extract patterns from the input images
          -convolutional layers (uses small filters kernels)
          -activation function allows the network to learn more complex non linear data
          -pooling layer reduces computational complexity
          -fully connected layer
          -softmax layer
          -dropout layer
       3. Classification Layers

    - Limitations of CNN:
       - Computation: needs alot of data and compute 
       - Overfitting: happens with limited traning data
       -Interpretiability: black box models
       -Sensitivity: sensitive to input variations

    - Main use of CNN is image classification, object detection, image segmentation, face recognition.

Generative AI & LLM Foundations

Intro to GenAI

Subset of Deep learning where the models are trained to generate output on their own.
GenAI models learn the underlying patterns in a given data set and uses that knowledge to create a new data set.

-ML identifies patterns to recognize and classify patterns. -inference is the ability to predict based on the training that was done before. -ML focuses on learning the relationship between data and the label -In GenAI it learns patterns in an unstructured content and it doesn't need labelled data to train. -The output of ML is a label whereas in GenAI it is New content.

2 types of Gen AI Models
Text-Based: models generate text, code, dialogue and they learn from large collections of text data
Multimodal: process multiple modalities like text, images, videos, etc.

Intro to LLM

A language model (LM) is a probalistic model of text.
it helps determine what the next word will be in a sentence it gives a probabilitity to every word in its vocab.
Large in LLM stands for the nuumber of parameters. -EOS stands for end of sentence or end of sequence.

LLM Features: - Based on Transformers this allows them to play attention to specific parts and gives them enhanced contextual understanding. - Deep Neural networks that are trained on a large set of text - Paremeters are adustable weights in the models neural network. - Model size is the memory required to store the models parameters

Transformers - Recurrent Neural Network handle sequential data like a sentence and they have a feedback loop that allows them to store and maintain a hidden state but RNN has trouble with Long-range dependencies. - As the length of the sentence grows it leads to Vanishing Gradient which means it loses context of the entire sequence.

Transformers Architecture:
They understand the relationship between all the words in a sentence at the same time and understand how they relate to each other.
Attention Mechanism (Self Attention) is used by transformers that adds context to the text and this also helps with long range dependencies
Transformer has 2 main parts:
Encoder: reads the input text aand encodes it into meanings using attention mechanism. Used for semantic search.
Decoder uses these embeddings to generate the output text of the next word (token). Decoders only generate a single token at a time. Used for text generation.
tokens: LLM understand tokens instead of workds. Tokens can be a part of a word, an entire word or a punctuation
Embeddings: numerical representation of a piece of text converted to number sequences. They can also be used in semantic search in a vector database
Words get converted into tokens then into embeddings (vector data)
Encoder-Decoder: encoder encodes a sequence of words to a set of vectors and the decoder generates the output sequence from the set of vectors. Here the decoder has a self referential loop as it keeps generating all of the tokens in the sequence. Used for machine translation

Prompt Engineering

Prompt: the input or the initial text provided to the model
Prompt Engineering: the process of iteratively refining a prompt for the purpose of eliciting a particular style of a response.
instruction tuning is a critical step in LLM alignment and it involves fine tuning a pre trained LLM on a varied set of instructions, each paired iwth a desired output.
Reinforcement Learning From Human Feedback is used to fine tune LLMs to follow a broad class of written instructions.
In-context Learning: prompting an LLM with instructions and or demonstrations of the task it is meant to complete
k-shot prompting: explicitly providing k examples of the intended task in the prompt
Chain of Thought Prompting: provide examples in a prompt to show responses that include a reasoning step and describes the calculation logic to get to the final answer before giving the final answer.
Hallucination: model generated text that is non factual and ungrounded. -retrieval-augmentation have less hallucination then 0-shot prompting.

Customizing LLMs with your Data

Prompt engineering is the easiest to start
If you need more context than use Retrieval-Augmented Generation (RAG)
More instructions require Fine-tuning
Usually need to use all of them
Retrieval Augmented Generation (RAG): languge queries enterprise knowledge bases (databases, wikis, vector database) to provide grounded responses. RAG does not require fine-tuning
Augmented Generation = providing a more concrete answer using the acquired information.
LLM Fine-tuning & Inference: take pre-trained foundational model and provide additional training using custom data -inference: model recieves new text as input and generates output based on what it learning during pre-training and fine tuning. -benefits: model performance on specific tasks, and improve model efficiency
Generative AI creates new content without making predictions
Sequence models are indeed well-suited for tasks involving sequentially ordered data points or events, such as time series analysis, natural language processing, speech recognition, and language translation. However, for image classification and object recognition, traditional machine learning models and convolutional neural networks (CNNs) are more commonly used.

OCI AI Portfolio

Data -> Infrastructure -> AI Services -> SaaS Apps
No infrastructure needs to be managed.
Ways to access Oracle Cloud Infra:
OCI Console is a browser based for all the features needed for data science likes notebook
Rest API
Language SDKs: Provides programming language SDKs
Command Line Interface: provides quick access and full functionality without scripting

Overview of AI Services:

Language: Text analysis at scale using pretrained models and custom models
Vision: upload images to detect and classify object using pretained and custom models
Speech: convert media files into readable text
Document Understanding: upload documents to detect and classify text.
Digital Assistant: Platform used to create and deploy digital assistants using natural language conversations.

Overview of ML Services:

3 core principles of OCI Data Science:
Accelerated: allows data scientists to work the way they want without needed to manage the infra
Collaborative: allows them to work together using Projects (notebook sessions) and uses the Conda enviornments
Enterprise-Grade: Fully managed infra and updates and security
OCI Data science is used to build, train, and deploy ML models and it serves Data Scientists.
Accelerated Data Science (ADS) SDK are given to data scientst that give them libraries to help data scientists
Model Catalog is a centralized repo where model artifacts are stored.
Model Deployments (to an HTTP Web app) -> Jobs

AI Infrastructure

GPU: Hardware that performs simple operations and allows many processes to run with parallel computing.

GPU & Superclusters in OCI

RDMA: Remote Direct Memory Access - Its a technology that allows for network communication without any cpu interference which allows GPUs to communicate with low latency and this is the core that their database services are built upon
ROCKY = RDMA converged ethernet
The partnered with NVDIA because there is a high demand for high compute GPUs that can run within a single RDMA network (ie. a Supercluster)

RDMA Supercluster which is designed to support a large number of GPUs. -GPU node connects to the network fabrics and any GPU can talk to any GPU through the fabric - Supercluster = it just means its much larger than a typical cluster and it has 2 blocks. One block uses a Clos Fabric and it uses a 3-tier network using silicon chips and buffers they counteract the latency that might occur with a large number of GPUs supercluster = Lossless - Using placement they are able to balance scalability and latency - OCI AI Superclusters are specifically designed to handle demanding AI workloads that require significant computational power and scalability. They are optimized to provide high performance for complex tasks like training large machine learning models, deep learning, and other compute-intensive AI tasks. - Dedicated AI Clusters provide GPU-based compute resources required to fine-tune a pre-trained model for specific tasks like customer support.

Responsible AI

Guiding Principles for AI to be trustworthy:
AI should follow applicable laws
AI should be ethical: Human Ethics and AI Ethics - used to help humans, prevent harm, and fairness and explicable
AI should be robust

-Responsible AI Requirements; 1. Set up goverance 2. Develop policies and procedures 3. Ensure compliance - roles: developers, deployers, and end users

OCI Generative AI Services

Fully mangaged service that provides a set of customizable LLM available through APIs
Choice of Models - high performing models from Meta and Cohere
Flexible Fine-Tuning to create custom models by fine tuning models using your own data set
Dedicated AI Clusters that host your workloads
2 types of Pretrained Foundational Models: 1.Chat Models: Such as Command-r-plus, command-r-16k, and llamma 3-70b-instruct: Llama is made by meta and the first two by cohere. R plus is more expensive and its used for more complex scenarios. They work by asking questions and get conversational repsonses aka go through instruction tuning
Embedding Models such as: embed-english-v3.0, embed-multilingual-v3.0: Text converted to vector embeddings used for semantic search and allows for multilingual models.
Fine tuning is used when a pretrained model isn't working or if you want to teach it something new.
T-Few Fine Tuning is what Cohere uses and it enables fast and efficient customizations - it introduces new base layers and only updates a fraction of the model so you dont have to fine tune everything which takes longer and costs more

-Preamble just changs the behaviour of the model but it is not finetuning

Vector Search

AI Vector search is built into the Oracle Database 23ai
Works on structured and unstructured data
Uses SQL support for vector generation, Vector Data type, indexes, and uses syntax similair to SQL.
Process: Load images as blob -> Vector Embededding -> store in DB -> Vector Search for similair matches
Vector Datatype: You can use the dimension format or not this is optional (ie. int, float, etc). The VECTOR datatype in Oracle Database 23ai is specifically designed to store embeddings for AI Vector Search. This datatype allows efficient storage and retrieval of high-dimensional numerical representations of data, enabling similarity searches for AI and machine learning applications.
Vector Distance Function shows the similairty between vectors. Vectors that have a small distance are more similair.
Vector Search SQL used to find top k closes matches to a given query item that uses (vector_distance)
Vector Index are used not only for performance but it also controls the accuracy using the organization and distance parameters. Organization is if it will fit in memory. If it will fit in memory use inmemory neighbour graph and if it doesnt use neighbour partitions
Target accuracy is a clause added to indicate the default accuracy the index should provide for similairty queries
Approximate keyword indicates that the user wants to perform a similarity search using a vector index.
You can also perform similarity search over joins
Allows you to efficiently orchestrate Gen-AI pipelines.
Model endpoints allow deployed models to be accessed via an API for real-time inference, making them available for AI applications.

Select AI

use your language to query the data (autonomous database) you dont need to know where the data is or how to access the database
It takes the natural language question to form a SQL query

OCI AI Services

OCI Language detects the language of your text, identifies entities in your text, identifies setinment for each aspect of text, identifies key phrases that represent important ideas or subjects, and it classifies general topic from list of 600 categories
OCI Speech converts speech to text using deep learning techniques. It also uses SRT closed caption support.
It also normalizes text to more concise versions of the text. (words to numbers)
Has profanity filtering: removing, mask (removes but leaves the first letter) and tags.
OCI Vision works on images and provides image analysis and document AI Image Analysis: Object Detection where it detects objects inside an image with a bounding box and a label. It can also detect text. Image Classification labels the scene and you can retrain data for specific needs.
Document Understanding its used for understanding document images -features: Text recognition (OCR) from images, document classification based on visual appearance, language detection, table extraction, and key value extraction

Additional Notes

Model training = establishing a relationship between input and output parameters
Gen AI aims to understand underlying data distribution and creates new examples.
Recommendations are given based on the user's past choices or similar user or product choices. Hence it is an example of a Supervised Machine Learning.
OCI Speech s SRT file support is the best choice. This allows captions to be added easily to videos in industry-standard format.
Predicting a house price which is a numerical value is an example of a supervised machine learning, more specifically Regression algorithm.
The GB200 GPU is a next-generation Grace Blackwell GPU designed for exascale AI and HPC workloads, making it more suitable for massive-scale AI training rather than standard large-scale AI workloads.
Once a model is trained, it needs to be deployed for real-time inferencing using OCI Data Science and GPU Compute. This allows the model to process new data efficiently.
Tokens are the fundamental units of text that Large Language Models (LLMs) process. A token can be a word, subword, or character, depending on the tokenization method used. The model interprets and generates text based on these tokens rather than entire sentences or paragraphs at once.
Hidden layers , take the input from input or other hidden layer and multiples it through weights and activations. Input layer accepts input and output layer outputs the final result.
Predicting a next note in music needs context of prior notes. For this RNN is well suited.
OCI AI Infrastructure includes NVIDIA GPUs, OCI Storage, and RDMA Networking for high-performance AI and ML workloads. However, OCI Vault is primarily used for securing and managing cryptographic keys and secrets, not AI infrastructure.
Oracle Database 23ai allows ONNX models to be loaded into the database, enabling vector embedding generation and similarity searches.
K-Nearest Neighbors (KNN) is considered a non-parametric algorithm: Unlike parametric models (e.g., linear regression, neural networks), KNN doesn't have any parameters that need to be learned from the data. The only parameter to tune is the number of nearest neighbors (K).
Organize text based on "politics" , "sports" or "news" = Text Classification

-The NVIDIA A100 GPU is widely used for small to medium-scale AI training and inference workloads, offering high-performance compute capabilities, tensor cores, and scalability. While H200 is a newer high-memory variant, the A100 remains a strong choice for efficient AI workloads. - Object detection is implemented using Deep Learning. Hence the answer is Deep Learning. - To retain the words but mark them, tagging is the correct choice. This method leaves the words in place while adding labels to indicate profanity. - Loss function checks what is the difference between actual value and predicted value. - Generative AI models do not require labeled data in the pre-training stage. Instead, they learn patterns from vast amounts of unstructured data, enabling them to generate new, unique outputs. - Spam detection is a supervised machine learning problem and NOT a unsupervised learning example. - Detecting pedestrians and making lane changes is similar to a human behaviour. Hence the answer is Artificial Intelligence. - Prediction of a next word given a sequence of words needs to use a context of prior words in a sequence. RNN is well suited for this. - The target variable refers to desired outcome. It could be a numerical value or a label. e.g. spam or not spam or predicted rainfall in milimeters. - Select AI translates natural language into SQL by leveraging large language models (LLMs) to infer intent and construct the required SQL query. - Detecting spam is a classification problem. Hence Machine Learning can be used for Supervised Machine Learning.