Tuesday, August 12, 2025

@scale networking event to learn from Meta networking researchers

One of my goal in 2025 is to get back into the swing of things, which is get back closure to hardware and stay in research mode. I used to attend many social events in my passionate field "Internetworking" and share my perspective. I've been on a long hibernation for the past 6 years. Hoping to get back to Networking research in year 2025. Aug-13th @scale networking gives me a perfect restart. Through my DataCenter networking social circle i got invited to a much anticipated networking related event by Meta @scale networking. I plan to dedicate my entire day to observe and learn as much as possible. 

What are my expectations, 

  1. Are there any new topology level innovations happened at Meta? What did they learn from different network topologies used to carry out AI work-loads? What is new after 5-stage + planes, Dragonfly and dragonfly+ topologies
  2. How Meta identifies potential networking side bottleneck from application level metrics?
  3. How meta identifies regular system level problems and root-causes them?
  4. What Meta does to stay ahead of ongoing networking innovations?
Networking plays crucial role in getting AI faster and better. Networking product and services grew exponentially in recent years especially recent announcements on TH6 ( Tomahawk6 ) from Broadcom has some unheard scales such as,

  • Number of ports per SoC ( System on chip) - 1024 channelized port in 1x chip.
  • Per port bandwidth and collectively per chip bandwidth. Popularly called as TH6 ( Tomahawk 6) expected to deliver 102.4 Terabits per second. 
  • Supports both scale-up and scale-out network
  • chiplet architecture
  • UEC support
P.S> Thanks to my 10-year old son Advik for proof reading.