Press J to jump to the feed. Press question mark to learn the rest of the keyboard shortcuts
16

Is their any truth to the trend of putting networking into Linux, being the future of networking ?

I am following the trend of linux based networking, open networking, disaggregation, web-scale...whatever you call it, and i wonder if this is a real disruptive change or not? Soliciting the wisdom of folks here to the impact of putting “Networking into Linux” on the future of networking. I'm not talking about the common "Networking in Linux" approach of hiding the linux kernel with a wrapper but real stuff that is done in the kernel like these folks : https://cumulusnetworks.com/blog/vrf-for-linux/

Does anyone see merit(s) in having native integration of the network router/switch base operating system with Linux, or the integration of the linux community (people) and data networking folks (CCIEs), reserved earlier for Cisco, Juniper and the likes.

78 comments
66% Upvoted
What are your thoughts? Log in or Sign uplog insign up
level 1

Linux and Unix have been part of networking since the very beginning. Juniper the best hardware/software vendor in my book runs off BSD, and Cisco runs their own Linux based kernels on many types of equipment. The integration has been around for years.

level 2

The interesting thing is how much or how little some of them hide it.

Arista and Juniper have taken the "sure, here's the OS shell, knock yourself out" point of view, which gives the network admin some seriously powerful debugging tools.

In Arista's case, this goes to the extent of "yep, it's Fedora under here, install yourself some RPMs if you like", which gives you options for monitoring, programmability and all sorts of general mischief.

For others, you really have to dig into some obscure diagnostic commands to even find reference to the fact that there's the Linux kernel buried somewhere under there.

level 3

I've been heavily enjoying Arista's approach, especially their ZTP process. I've got custom bootstrap scripts fetching data from /etc/prefdl, making decisions on tweaking interfaces, hitting callback URLs on ansible tower to kick off jobs against the switch in question. It's super neat.

level 3
Comment deleted3 months ago
level 4
JNCCNAIE8 points · 3 months ago

You're looking to software being the difference. The difference is in the hardware. Networking gear has ASICs, so the performance will be great no matter what, as long as the drivers are done right. Commodity server hardware can't really compete.

As for the software, whether the networking code is in the kernel or not makes very little difference, IMO. The performance and functionality is in the hardware.

level 5
I break things, professionally.3 points · 3 months ago

I’ll agree that the performance is in the hardware, sure. But the functionality is just as much a part of the software stack as it is the hardware’s capabilities. For example, you can have hardware that can do single pass VXLAN encap/decap, but it really isn’t worth anything without a control plane that understands how to make a table of VTEPs and destinations and then actually program that into the hardware.

I think OP has a good question regarding what’s more valueable. Native support inside the Linux kernel, versus prorprietary services. At the end of the day though I think we are going to see proprietary services as king for quite a while longer. Even Cumulus, for all of their contributions to the kernel and FRroutinv/Quagga, still hasn’t open sourced switchd, which is the piece that makes that ASIC do what it needs to do.

level 5
Original Poster-1 points · 3 months ago

A long time ago, you needed expensive hardware to run small things and now we dont, isnt that right? We need GPUs for ML today but is that going to continue?

level 6
JNCCNAIE6 points · 3 months ago

Yes. High performance will always require specialized hardware.

level 7

The floor and ceiling of “high performance” increases with time. What was tops-of-the-line performance in the 60’s has been available in calculators since the 90s, to make no mention of the incredible computing power we all carry in our pockets these days.

level 8
JNCCNAIE1 point · 3 months ago

Yet there's still a demand for the latest iteration of infrastructure, whether it's networking equipment, servers, GPUs, etc.

Technology is iterative. All of it. That means newer iterations of one type of technology will increase demand for other types of technology. The "power we all carry in our pockets", for example, massively increases demand for bandwidth and higher performance networking equipment.

level 9

Yep, exactly. It's not like there's some maximum we'll reach, either. As we develop new technology to meet current demands, we unlock pent-up demand for the more capable technology, which drives more demand. It's a self-perpetuating machine.

level 7

There is a counterpoint to this. Network hardware isn't ever going to look like general purpose compute hardware but that doesn't mean that it won't ever be something more programmable. This is the same problem as GPUs all over again. At first they did absolutely nothing other than graphics and now GPUs are completely programmable via CUDA and OpenCL. I'm sure networking hardware will eventually go the same route with products like Tofino.

level 8
JNCCNAIE0 points · 3 months ago

In some ways yes, however ASICs are not programmable.

level 9

Tofino is fully programmable and supposedly already shipping. It runs at 6.5 Tbps in a single ASIC.

level 10
Cumulus co-founder2 points · 3 months ago

Tofino is very impressive and way more programmable than a Broadcom or Mellanox ASIC, but it is still far less flexible than a CPU or NPU. That said, it greatly increases what is possible on a line-rate ASIC.

A good way to think about Tofino is that it is like an FPGA, but with high-level parse, lookup, and modification blocks instead of low-level LUTs.

level 6
CCNA R&S | Avaya Tech4 points · 3 months ago

You don't need expensive hardware for it to work initially, but you do need expensive hardware if you want it to work with 20,000 hosts and multi-gigabit throughput. Software will never be able to keep up with Hardware. I'm sure that's written in the Bible somewhere.

level 7
Original Poster3 points · 3 months ago

Distributed Architectures are not useful in these kind of problems?

level 8
Systems Engineer3 points · 3 months ago

It depends on whether the problem is better solved with distributed computing using commodity hardware, or centralized expensive purpose-built hardware. There are a lot of advantages to distributed computing using commodity hardware, but there are also additional complexities with distributed computing.

In short: it depends

level 3

And now Cisco's like, here, have an SDK, run python apps natively on the box. Oh and if you want a Bash shell, would you like the native shell or a disposable guest shell?

level 4

Can you install whatever libraries you want?

level 5

I believe so, or you can bundle the library with your app.

level 6

That's a good change - I've been investing a lot of time into creating 'network device native' python scripts to enhance certain operational processes so it's good to hear cisco's coming along for that ride.

level 2

Juniper's base OS is now Linux, and their REs are BSD virtual machines running in QEMU.

level 3
It places the packet on the wire or else it gets the hose again.1 point · 3 months ago

I think the EX4600 was the first on their gear to run in that configuration.

level 2
Original Poster3 points · 3 months ago

I could have been more specific, "Networking into Linux" for me is when you have networking applications in Linux using kernel data structures and constructs, with tool set and utilities of Linux.

level 2

Indeed quite a lot of networking products run Linux under the hood.

Makes sense, if you need an OS then it's a ready made and easily modifiable one. Would need a very strong reason to use or develop something else.

level 2
Original Poster0 points · 3 months ago

I'm not talking about the common "Networking in Linux" approach of hiding the linux kernel with a wrapper but real stuff that is done in the kernel like these folks : https://cumulusnetworks.com/blog/vrf-for-linux/

level 1

Ivan Pepelnjak covered the question of using the Linux kernel as the control plane vs. using Linux as the OS running a user space control plane a few months ago:

http://blog.ipspace.net/2018/01/packet-forwarding-on-linux-on-software.html

It was a good episode, featuring some of the developers who extended Linux to make the use case truly workable.

I'm convinced. Linux control plane all the things!

level 2
Original Poster1 point · 3 months ago

How about data plane ?

level 3

What about it? Putting the Linux kernel in the forwarding path implies not using hardware acceleration, so I'll pass on that, I guess.

level 4
Original Poster1 point · 3 months ago

That's exactly what i am interested in. Can we do Forwarding efficiently in the Linux kernel, with hardware acceleration ? Is it a matter of time before these closed APIs from Broadcom and Cavium will open up ?

level 5

Can we do Forwarding efficiently in the Linux kernel, with hardware acceleration ?

"forwarding in the Linux kernel" and "with hardware acceleration" are mutually exclusive in my book.

We may have a misunderstanding about terms here.

I'd describe Cumulus as having an external, hardware accelerated data plane managed by the Linux kernel.

A frame/packet transiting the front panel of a Cumulus switch never touches the kernel and, other than pulling stats from the underlying hardware, the kernel will never know that packet went by.

level 5
CCIE #19373 points · 3 months ago

You may find this post informative. There's apparently a lot of effort being put into high performance packet forwarding on generic hardware, and they've got some very impressive results.

level 5
2 points · 3 months ago

You can do something that feels pretty close to the same thing. Cavium has for years maintained a Linux kernel module that transparently offloads most forwarding traffic via their ASIC and configuration is just bog standard Linux networking. You set up your firewall rules in IP tables and behind the scenes that kernel module translates it into an equivalent configuration in hardware. Any fancy features that the hardware doesn't support like QoS just get forwarded to the Kernel and processed as usual. Eventually I'm sure the community will standardize on an API that can be supported by the kernel natively.

level 6

The Mellanox guys have upstreamed SwitchDev into the kernel. This allows a vendor neutral mechanism to sync the fib between the kernel and the hardware.

level 3

To what extent? Hardware computes in the dataplane, not software. Obviously the hooks between planes require software, but if you care about scale and performance, forwarding is a hardware task. Current ASICs do line rate processing of ACLs and L4 DPI. Sure, there are some insane instances of high scale throughput, but they're feature poor or built for lab environments.

level 4
Original Poster1 point · 3 months ago

What are the architecture and design pros and cons of using a userland process like VPP vs the way things are done in Pica8 and Cumulus with Mellanoix and broadcom, which one performs better and has decent feature velocity, has anyone compared these two ?

level 5

If you use the general processor for packet handling, what happens when something causes the processor to stop and ponder its life? An ASIC would drop the packet, but a processor can get consumed by other tasks unrelated to forwarding packets. This is why packet forwarding is handled on the ASIC, and other tasks get escalated to the processor on an as-needed basis.

level 6
Original Poster0 points · 3 months ago

The same thing we do when any type of server is interrupted, we rely on service high availability to workaround bugs, we cant expect a perfect network that doesnt skip a beat, because it can and it will, just like other piece of software out there, are we restricting ourselves to hardware solutions by these ideas ?

level 7

I'm just stating that it is most efficient to handle edge connections at the edge, and only escalate to the core proc when necessary. There are massive trade-offs to routing packets at the core. 15 years ago, all routing was done at the core - and therefore routing packets took milliseconds, whereas switching was nanoseconds. Now that routing is done on silicon we have routing happening at wire-speed.

If you bring all traffic to the kernel, there is no two ways about it, it will be slower - unless you find a way to slow time or speed up light.

level 5
Cumulus co-founder1 point · 3 months ago

All the below are rough estimates based on when I last looked into this back in 2016, but nothing has changed radically since then.

An ASIC like Mellanox or Broadcom will always, or almost always be able to do all features at full line rate. For the same power budget, that rate will be approximately 10-20x faster than an NPU.

An NPU will be outperform a general purpose CPU by 4-10x, depending on how aggressive you get with optimizing the code for the CPU. The 10x is if you just use the normal Linux/BSD forwarding code. The ~4x is if you do custom code using something like DPDK.

So why do people use customized CPU forwarding? If you just need a 3 10G interface router, x86 servers are cheap and easily available and can almost always keep up. And they have a much richer feature set than fixed function ASICs.

Why do people use NPUs instead of ASICs? Flexibility. The feature set can be much greater than the fixed function ASICs.

Some of these lines are blurring a bit with semi-programmable ASICs like Cavium, and especially with Barefoot's Tofino ASIC.

level 1
Meow 🐈🐈Meow 🐱🐱 Meow Meow🍺🐈🐱Meow A+!2 points · 3 months ago

Putting it into the control plane or data plane?

level 2

^

level 2
Original Poster1 point · 3 months ago

Both.

level 3
Meow 🐈🐈Meow 🐱🐱 Meow Meow🍺🐈🐱Meow A+!3 points · 3 months ago

Then yes.

level 1

Two things,

Infighting

Support (Especially for enterprise customers)

level 1

A long with everything being base on linux, I use linux VYOS VMs for routing as well.

level 1

It's a little hard for me to follow what exactly you're asking, but I'll give you an example from both directions to illustrate what I think is useful.

First, using a Linux device as both the route/switch/whatever and the application server. This is extremely useful because of the massive levels of scale that can be accomplished with what is effectively commodity network and server equipment. Instead of purchasing routers, firewalls, servers, and all of that, you can simply connect fabrics of switches together, terminate your transit at whatever layer you like within that fabric, and those Linux devices can pretty much handle the rest. They can do all of the BGP announcements, they can do VRRP between each other if that's your thing, you can use ECMP to each host, you can do clever network tricks to handle dropping a request into a mesh of servers and only having a single one respond, and so on. Basically you limit your equipment spend, eliminate major bottlenecks and chokepoints, and so on. Additionally, you get to use all of your normal config management tools like Ansible, Chef, Puppet, or whatever you like to manage the only devices you actually maintain- Linux servers. This is, at a high level, how Facebook and Google have scaled their networks so well, so quickly, and at a significant cost savings over what would have otherwise been designed in a more traditional setting.

On the other hand, using Linux inside white and gray box devices like Arista or Cumulus switches is also massively powerful. First, you get the Linux networking stack, which has its positives and negatives. Things like VLAN membership isn't global on the device, so you can have multiple VLANs with the same ID without conflict. For virtual networking, or datacenter networking where tenants might make some changes (Softlayer, rackspace, Digital Ocean for example), this means your customers won't have to come up with their own IDs and avoid conflicts, which is a pretty big win. Administrators also understand how networking gear will work, and the tools to interact with it, which is nice. But, none of that compares with the ability to run custom software on purpose-built network hardware running Linux. Imagine you have a Cumulus or Arista switch, and you want to do some random thing on the device. Traditionally, you'd be out of luck. but with these Linux network appliances, you just write your software, load it into the device, and run it! Services, applications, whatever. And you can get access to the accelerated networking components, which means your service can handle traffic at blazing fast speeds.

As a final example of where converging things are going, XDP, DPDK (and its competitors), af_packet, etc. all serve to massively accelerate packet handling in several situations, and can be extended to various levels of complexity. By spreading data intelligently across buffers, CPUs, and host NUMA architectures, and accelerating it with the various fast pathing tools that are quickly maturing, devices are seeing orders of magnitude better performance. Hosts that struggled to manage traffic levels over 10Gbit without specialized drivers and hardware can now handle multiple 100G ports without much trouble. With some real focus and attention, companies using these technologies are being rewarded with huge performance increases, which means better customer experiences and fewer hardware dollars spent. And all of that ties back into the previous two examples. With these tools, easily affordable and commodity hardware is performing on the same level as purpose build chipsets and fabrics. And all in an operating system that tens of thousands of people have administrative experience with. That's industry-changing stuff right there.

level 1
[deleted]
1 point · 3 months ago

I think this gives you much more ability in operational and monitoring sense. Running native packet sniffers/wiresharks and complexity as well. I see the trend is a bit shifting away from cisco and the likes.

level 1

I'd see merit in this approach for better 'route to the host' support.

I've had to handle shoving things into multiple namespaces before and it's a PITA to manage.

Pretty good example: https://stackoverflow.com/questions/28846059/can-i-open-sockets-in-multiple-network-namespaces-from-my-python-code#28865626

level 1

I was really hoping we'd see more native kernel development with the Broadcom/Intel ASICS. If it was natively supported I could then just buy a switch and install Red Hat on it.

level 2
The best TCP tuning gets you UDP behaviour. So we just use UDP.5 points · 3 months ago

You can already, if you buy a switch with a supported ASIC and put e.g. Cumulus on it. But because the API to the ASIC is proprietary, this will not make it into more general Linux distributions. I think the API for some Mellanox switches is fully open, but you would still need to put Cumulus, or a more open but switch-centric OS on it to have it work as a switch.

level 3

I mean this is Arista. It's Fedora linux, and literally the only custom kernel module is the Broadcom ASIC driver.

level 3

Yeah, I don't want to use Cumulus.

level 4
Original Poster0 points · 3 months ago

is there a better way today?

level 5

No, and that's kind of my point. You can use either Cumulus or the hardware vendor's OS, that's it. Where a server I can install anything that runs on x86.

level 6

There's more than Cumulus supporting ONIE boot and related functionality these days - http://www.opencompute.org/wiki/Networking/ONIE/NOS_Status

https://www.pica8.com/product/

https://www.pluribusnetworks.com/products/white-box-os/

Not a huge list, I'll admit.

Programming good features that aren't just poking instructions at broadcom's SDK is a) hard and b) really hard.

Broadcom doesn't just hand the thing out, either :/

level 6
Original Poster1 point · 3 months ago

I guess (correct me if i am wrong) you can install Cumulus OS for free on any x86 but you will only get a limited performance (~70 - 80Gb on one box) and they give you the option to purchase special hardware to go higher. This can improve if someone would be able to do things faster in the kernel with more features and that was what companies like Cumulus is trying to do.

level 7

To answer your question directly, yes I like what Cumulus is doing. However, I do not like the fact that it's only Cumulus doing it.

level 8
obsessed with NetKAT2 points · 3 months ago

what? cumulus is a control plane that speaks to a switching ASIC -- like literally everyone else.

cumulus does their own linux to ASIC driver, instead of what many other whitebox switch control plane vendors do, which is just %s/ASIC_SUPPLIER/VENDOR/g on the reference driver.

level 9

Like I said previously, I would rather the drivers be native to the Linux kernel. However, as someone else pointed out the API is proprietary so it most likely isn't compliant with the GPL.

level 10
obsessed with NetKAT1 point · 3 months ago

from a performance perspective, why does that matter at all?

don't expect any switching ASIC vendor to mainline drivers. it's not gonna happen. NIC drivers just interface with a proprietary firmware API. everything happens in a "black box" there. it's the same thing with switching ASICs from Broadcom, Mellanox, Cavium, etc.

https://github.com/Broadcom-Network-Switching-Software/OpenNSL-Tool-Suite

level 8
Original Poster1 point · 3 months ago

Why is it only Cumulus ? Is it because they have the talent or because no other company wants to take that approach ? Does Pica8 and Big switch and Arista follow a similar solution ?

level 7
obsessed with NetKAT1 point · 3 months ago

what?

much of the highest throughput network stacks are userland it's very tough to break the 40Gbps per core, line rate, for a number of reasons -- namely you only get so many instruction cycles per packet, and PCIe bandwidth.

level 8
Original Poster1 point · 3 months ago

So its in userland because of architectural limitations, does that mean we should only use these open networking solutions for ToR switches and not clos type of fabrics (bigger purposes) ?

level 8
packetpushers.net1 point · 3 months ago

You can improve this by using a fancy NIC that has onboard FGPA/CPUs or ASIC to get more than 100G (some up 200G) today.

level 6
packetpushers.net1 point · 3 months ago

There are many people who roll their own Linux.

Or you can go with Link: OpenSwitch - https://github.com/open-switch

Link: Open Network Linux - https://opennetlinux.org/

level 2

hoping we'd see more native kernel development with the Broadcom/Intel ASICS

Not possible. The kernel is open source, while the SDKs from the ASIC vendors come only under NDA. That's why switchd is only available as a binary.

level 3
Original Poster1 point · 3 months ago

What if those SDKs are open sourced ? What will it take for a genius to reverse engineer that shit and set networking free :)

level 4

That's an expensive proposition that won't result in a stable/sellable product, but will result in endless lawsuits. I don't see that happening.

Having said that, I don't understand why the ASIC vendors hide their cards like they do. It's not like making a switching ASIC is that hard... Not that I'd even know where to begin, but Barefoot only needed a few years to crank out a truly revolutionary chip, that was (briefly) faster than anything else on the market.

My guess is that Broadcom hides their SDK for two main reasons:

  • Habit

  • Maintaining this barrier keeps their customers (Cisco, etc...) happy precisely because it doesn't "set networking free"

level 5

It's not like making a switching ASIC is that hard...

I don't have any in-depth knowledge of the topic, but I'm going to take a wild guess and say it IS hard but it's probably due to patents more than anything.

level 6

Heh. I don't mean to trivialize it, but it's been recently demonstrated that a young company, starting from scratch, can totally do this.

level 5
I break things, professionally.1 point · 3 months ago

I imagine it’s to lock in OS vendors. If you’re Broadcom, and Cavium came along with a cheaper chipset that did the same things and had the same API there would be pretty much nothing to stop your customers from jumping ship.

It would be great for the consumer, though.

level 2
packetpushers.net2 points · 3 months ago

You can do this today. There are several companies that roll their own Linux for switches and routers. The hardware ASICs do the forwarding, Linux programs the ASICs while handling all the management and operations needed to gather route data, monitor the system etc.

level 2

I am no way associated with them, but I will suggest that you can buy a Mellanox Spectrum based switch, and by using their ONIE install mechanism, you can install a plain-jane recent linux kernel on them, and make use of the hardware based forwarding. You'll need to load the SwitchDev kernel module. SwitchDev is integral and is delivered with the kernel. It is not a binary blob such as what is needed to run Broadcom asics. In any case, switchdev takes care of syncing the kernel's forwarding information base with the hardware. The road map of implemented features is reasonably substantial. You can then use software like Free Range Routing and/or Open vSwitch and/or iproute2 to handle control plane operations, and leave the forwarding plane for the hardware.

level 3
Community Details

131k

Subscribers

937

Online

###Enterprise Networking Routers, switches and firewalls. Network blogs, news and network management articles. Cisco, Juniper, Brocade and more all welcome.

Create Post
r/networking Rules
1.
Rule #1: No Home Networking.
2.
Rule #2: No Certification Brain Dumps / Cheating.
3.
Rule #3: No BlogSpam / Traffic re-direction.
4.
Rule #4: No Low Quality Posts.
5.
Rule #5: No Early Career Advice.
6.
Rule #6: Educational Questions must show effort.
Cookies help us deliver our Services. By using our Services or clicking I agree, you agree to our use of cookies. Learn More.