Underflow - numbers that are so small they get rounded to zero, later causing a NaN value if say we try dividing by the 0

Overflow - numbers that are so big they can’t be represented, later causing a NaN value

Architecture / neural network - the composition of functions for which we are trying to learn weights.

Gradient descent - An algorithm for finding the minimum of a function using gradients and a learning rate. The basic rule is to move move in the steepest direction, and to take a step proportional to the learning rate multiplied by the gradient. If its

Backpropagation - a way of getting the gradients for weights in every layer of the network at once. The simplest case is for one datapoint that we run through our network. After getting the prediction, we can use the true label to get a score of how far off we are - the “loss function”. We further off our prediction was, the more we want to update the weights.

a layer with different points in the networks, using the chain rule and the intution that

Saturated Gradient

Matmul / Matrix Multiplication - an affine transformation, weighted lookup table https://e2eml.school/transformers.html#matrix_multiplication

Register - a memory location. The processor can access a register to read or write once per clock cycle.

Affine Transformation - what a matrix multiplication is.

computation graph

Byte code is generated from the compilation of source code. However, it cannot be directly run by the CPU and requires an interpreter for execution. Machine code is essentially machine language that the CPU can process directly. It’s in binary format and doesn’t require separate interpretation or compilation.

Bytecode - an intermediate representation betweem python code and machine code. When a python program is run, it is compiled into non-runnable bytecode. Then the python interpreter can compile the bytecode into machine code and execute it. [^BytecodeMachineCode]

Assembly language - a human readable machine code representation

Machine Code - instructions that the CPU can process directly, often written in binary [^BytecodeMachineCode]

op code machine dependent instruction code

Protobuf You may be familiar with the .pb extension, which is a protobuf file. Protobuf, which is similar to XML, is a platform neutral way to serialize structured data. This holds both the compute graph of the network as well as the weights themselves.

frozen computation graph: all the weights in the network are read-only

General Definitions

There is a glossary below with some of the key terms: Python Code: This is the high-level code you write to define and train a TensorFlow model. It uses TensorFlow’s Python API to build computational graphs, define operations, and manage data flow.

Bytecode: When you run Python code, the Python interpreter compiles it into bytecode, which is an intermediate representation. This bytecode is then interpreted by the Python virtual machine (PVM). However, for TensorFlow, the Python bytecode mainly orchestrates the operations, rather than performing the heavy computations directly.

TensorFlow Operations (TF ops): These ops are defined in C++ and optimized for performance. The Python API is essentially a wrapper around these efficient implementations. Python code translates into a graph of TF ops, which can be executed independently of the Python runtime. The translation from Python code to these ops involves building this graph, which is then optimized and executed by TensorFlow’s runtime.

Assembly Code: The TensorFlow ops, once defined, are executed by the underlying hardware (CPU, GPU, or TPU). The TensorFlow runtime and libraries (written in C++ and CUDA for GPUs) convert these ops into assembly code, which the hardware understands directly. Assembly code is the low-level human-readable representation of the machine instructions.

Opcodes: These are the actual machine-level instructions that the CPU or GPU executes. They are derived from the assembly code and represent the fundamental operations supported by the hardware, such as arithmetic operations, memory accesses, and control flow instructions.

Tensorflow Definitions

Eager Execution: Eager execution is an imperative programming environment in TensorFlow that evaluates operations immediately, without building graphs. This mode is intuitive and easy to debug, as it executes operations step-by-step as they are called.

Graph Mode: Graph mode in TensorFlow involves building a computational graph of operations before executing them. This mode allows for optimizations and efficient execution, as the entire graph can be analyzed and optimized before running. It is the default mode for TensorFlow 1.x and can be used in TensorFlow 2.x via tf.function.

TensorFlow Kernel: A TensorFlow kernel implements the core computation for a TensorFlow operation (op), usually in C++. Each operation in TensorFlow is associated with one or more kernels that handle the actual computation on different hardware types (CPU, GPU, etc.).

CUDA Kernel: A CUDA kernel is a function written in CUDA C/C++ that runs on NVIDIA GPUs. These kernels are executed in parallel across many GPU cores, providing massive computational power for tasks such as matrix multiplications and convolutions, which are common in TensorFlow operations.

Concrete Function: a specific, optimized, and executable instance of a TensorFlow function defined using tf.function. A concrete function is created with specific input shapes and types, and it contains the compiled and optimized graph ready for execution. Another way to think about the concrete function is that it wraps the original graph function to make it differentiable / tracked by tf.GradientTape[^concreteFunction]

Compilation Definitions

*LLVM (Low Level Virtual Machine): a compiler used to optimize and generate efficient machine code for different hardware architectures

Intermediate Representation (IR): IR is an intermediate form of the computational graph or operations, which can then be optimized for specific hardware. In other words, the LLVM compiler needs some data structure to optimize, and this is it! Each optimization pass will update the IR, doing optimizations like loop unrolling, constant folding, and dead code elimination.

**HLO IR**: High Level Optimization IR
**MLIR**: Multi-Level IR, respectively. used by LLVM to optimize the graph. The XLA compilation process optimizes the IR over multiple passes. 

AOT (Ahead-of-Time) Compilation: compiles code into machine code before runtime, baking the ops and weights into an executable for quick inference and reducing runtime overhead.

JIT (Just-in-Time) Compilation: compiles code into machine code at runtime for faster training. JIT compilation optimizes and executes operations on the fly, balancing the need for optimization with the flexibility to adapt to different runtime conditions and input data.

References [^BytecodeMachineCode]: https://www.geeksforgeeks.org/difference-between-byte-code-and-machine-code/#:~:text=Byte%20code%20is%20a%20non%2Drunnable%20code%20generated%20after%20compilation,is%20directly%20executed%20by%20CPU.