Neural Network Optimization with TensorFlow and XLA, Overflow
overflow from the other pages
organization of the series
- compute graphs
- python, tensorflow operations, protobuf
- general compilation
- graph tracing, graph lowering, autograph
- xla compilation and optimization
- IR, fusion, layout, scheduling
- Tutorial on generating the graphs for manual inspection
- hardware
- registers, threads, warps, blocks, grids, SMs
- anatomy of an a10
- cuda kernels
- performance optimization with tensorboard
- trace viewer stage 1 analysis
- occupancy
- gpu architecure tuning
- memory, compute, bandwidth, occupancu
- use nvidia compute?
Questions
- what is an Eigen Meta Kernel?
- what do the various numbers of groupings correspond to?
- the kern_precomputed_indices have anything to do with the synchronization issues?
- which mobileone interventions can i apply
- what paarts of the focalnet might be respondisble for forcing synchronization?try removing things like GAP until