One of my goal in 2025 is to get back into the swing of things, which is get back closure to hardware and stay in research mode. I used to attend many social events in my passionate field "Internetworking" and share my perspective. I've been on a long hibernation for the past 6 years. Hoping to get back to Networking research in year 2025. Aug-13th @scale networking gives me a perfect restart. Through my DataCenter networking social circle i got invited to a much anticipated networking related event by Meta @scale networking. I plan to dedicate my entire day to observe and learn as much as possible.
What are my expectations,
- Are there any new topology level innovations happened at Meta? What did they learn from different network topologies used to carry out AI work-loads? What is new after 5-stage + planes, Dragonfly and dragonfly+ topologies
- How Meta identifies potential networking side bottleneck from application level metrics?
- How meta identifies regular system level problems and root-causes them?
- What Meta does to stay ahead of ongoing networking innovations?
- Number of ports per SoC ( System on chip) - 1024 channelized port in 1x chip.
- Per port bandwidth and collectively per chip bandwidth. Popularly called as TH6 ( Tomahawk 6) expected to deliver 102.4 Terabits per second.
- Supports both scale-up and scale-out network
- chiplet architecture
- UEC support