December 24, 2019

391 words 2 mins read

NetOps Coding 201 - Building Facebook's FBAR for Network Devices

NetOps Coding 201 - Building Facebook's FBAR for Network Devices

Continuing with our theme of automating day-to-day operations, we'll dive into building your very own FBAR! FBAR is a system used by Facebook to handle server and …

Talk Title NetOps Coding 201 - Building Facebook's FBAR for Network Devices
Speakers David Swafford (Facebook)
Conference NANOG66
Conf Tag
Location San Diego, California
Date Feb 8 2016 - Feb 10 2016
URL Talk Page
Slides Talk Slides
Video Talk Video

Continuing with our theme of automating day-to-day operations, we’ll dive into building your very own FBAR! FBAR is a system used by Facebook to handle server and network fault detection & repair – which offsets much of our traditional NOC through software. While FBAR on it’s own is a massive system tightly integrated into all aspects of FB infrastructure, we’re going to start fresh here and build a simplified version that focuses on the network side. The version we’ll build follows the same model of parsing standard syslog messages into faults / events, and then running remediation scripts against those to further diagnose and potentially repair / mitigate the issue (i.e. moving traffic away for example from a bad path by changing BGP policy so that it’s no longer user impacting). In NetOps Coding 101, we focused heavily on regular expression parsing to build two example remediation scripts. We’ll use those and additional ones here, but the focus of this session will not be on regular expressions or additional remediation scripts – instead, we’ll focus on the system itself. The system we’ll build should be more production ready by the end. It’ll be structured in such a way that we can act on and remediate many devices at time (learning of Python topics such as threading / queues / and parallelism in general). We’ll also focus on how to keep track of events, actions taken and their results (touching into the topics of storing and querying data) – because we don’t want to continuously run the same remediation script all day on the same device! Note: This session builds from “NetOps Coding 101” – but attendance of that is not required. If you have the basics of Python and Regex Parsing down, you’ll be right at home! Come have fun and hack with us and walk away with the knowledge to automate the mundane and shift into the new hybrid network engineer!

comments powered by Disqus