overflow from the other pages

organization of the series

  • compute graphs
    • python, tensorflow operations, protobuf
  • general compilation
    • graph tracing, graph lowering, autograph
  • xla compilation and optimization
    • IR, fusion, layout, scheduling
    • Tutorial on generating the graphs for manual inspection
  • hardware
    • registers, threads, warps, blocks, grids, SMs
    • anatomy of an a10
    • cuda kernels
  • performance optimization with tensorboard
    • trace viewer stage 1 analysis
    • occupancy
  • gpu architecure tuning
    • memory, compute, bandwidth, occupancu
    • use nvidia compute?

Questions

  • what is an Eigen Meta Kernel?
  • what do the various numbers of groupings correspond to?
  • the kern_precomputed_indices have anything to do with the synchronization issues?
  • which mobileone interventions can i apply
  • what paarts of the focalnet might be respondisble for forcing synchronization?try removing things like GAP until