When thinking about purpose-built systems, it’s easy to focus on the high-visibility consumer products—the iPods, the TiVos. Lying in the shadows of the corporate data center, however, are a number of less-glamorous devices built primarily to do one specific thing—and do it well and reliably.
Few engineers have more experience building these systems than Chuck McManis of NetApp (Network Appliance Inc.). Prior to joining NetApp, McManis worked at FreeGate, a company he started with two colleagues that built an embedded server appliance for the SMB (small and midsized business) market. McManis joined NetApp in 2001, where he has been driving the scalability agenda as a senior technical director. In that role, he is deeply involved with developing NetApp’s line of NAS (network-attached storage) appliances for both enterprise and SMB customers. “We’re not a server company; we’re an appliance company,” McManis is quick to remind us.
Interviewing McManis is Queue editorial board member George Neville-Neil, who when not edifying and entertaining Queue readers through his Kode Vicious persona, works on network and operating system code for fun and profit. Neville-Neil is no stranger to the purpose-built world, having spent more than 10 years of his career building embedded systems. He is also the co-author, with Marshall Kirk McKusick, of The Design and Implementation of the FreeBSD Operating System (Addison-Wesley Professional, 2004), as well as a contributor to FreeBSD.
GEORGE NEVILLE-NEIL You have quite a bit of experience in the appliance space. Can you tell us a little more about some of the work you’ve done leading up to NetApp?
CHUCK McMANIS In 1996 I started a company with Jean Gastinel and Jean-Marc Frailong (formerly of Sun and Xerox) called FreeGate. At the time, small businesses attempting to connect to the Internet faced a number of hurdles that were insurmountable for the nontechnically oriented (think pizza parlors), and service calls were too expensive for ISPs. The FreeGate Gateway appliance allowed lights-out management in the business phone closet with built-in ISDN connectivity and optional T1/E1 connectivity. The appliance provided complete Internet point of presence with mail, Web, FTP, DNS, and VPN outward facing, NAT, firewall, and NAS inward facing. I was the director of systems, later chief architect, and stayed with that company from its inception to its acquisition by Tut Systems in early 2000.
Tut had a horrible 2001 after the collapse of the DSL market, so I left Tut to join Network Appliance in August 2001. This job has involved developing an entirely new architecture and model for providing storage services and a systems architecture around commodity hardware to implement that architecture. This work was contributory to the decision by NetApp to acquire Spinnaker Networks in 2002/2003 and is the base architecture of NetApp’s next-generation systems.
GN-N How was working on the FreeGate appliance with the kind of software you had at the time different from building the current appliances you’re working on at Network Appliance?
CM It’s very similar, and one of the things that has really struck me about building these things is that the process you go through is common to all applications. You basically have a platform that has certain capabilities, and if you’re using off-the-shelf platforms, you have a choice among individual platforms but not among individual features. You get a collection of features.
In both FreeGate’s case and NetApp’s case, we have a customization step. With FreeGate, the customization requirement primarily involved size. We wanted a small compact board. In the case of NetApp, we wanted more I/O ports than a typical x86-type PC would have. In both cases, there are a number of third parties who are skilled in manufacturing motherboards that have the standard PC feature set, so there are a lot of choices. If you go out and say, “I need a motherboard that has two Ethernets and six ports and six drive connectors on it,” a lot of people can respond to an RFP like that very proactively.
From a software development standpoint, in the ’90s the open software movement had just gotten started. The Linux kernel was not as big as it is now. It was still frothing along, and there was a rather small number of people in the industry who had come from systems companies like Sun and HP and places that had a background in those technologies. But now at Network Appliance we can actually find people just out of college who have been doing kernel development for five years.
GN-N In terms of the software platforms you’re working with, how much open source software do you wind up using? What’s the mix when you’re developing a new appliance?
CM I think it’s about 40 percent from an externally acquired source and 60 percent from an internally generated source. Again, that ratio is pretty consistent with what I experienced at FreeGate, primarily because there are a small number of things like the TCP/IP stack that have not really changed in the last 15 years—except for adding IPv6.
For other things however, such as CIFS (common Internet file system), which is Microsoft’s disk-sharing protocol, there was no open source equivalent that had the level of capabilities that we needed for our customers, so we pretty much developed our CIFS stack entirely in-house from a blank sheet of paper.
All in all, I find that the open source stuff is a mixed blessing. On the one hand, you might be able to get a leg up on something. On the other hand, because it’s being modified by people outside your domain of influence, or certainly outside your domain of control, you’re stuck with either picking a point and modifying that, staying with it, and stabilizing it, or hiring a bunch of people to absorb all those changes. If you’re doing an embedded appliance, using these open source distributions such as Linux or FreeBSD doesn’t really buy you a whole lot.
GN-N When you work on something like a CIFS implementation, do you wind up putting that back into the open source pool as a way of making sure it remains as part of a core system, or do you just keep it in-house?
CM Well, it depends. The challenge with CIFS has always been that Microsoft has never really considered it to be an open protocol. In the early days, most of the work that we had done on it was reverse engineering what we saw on the network and saying, “Oh, that must be how it does that.”
Later on, Network Appliance was one of the companies that licensed the standard set of core protocols from Microsoft that gave us access to the data we needed. The open source version of CIFS is Samba, which I’m sure you’re familiar with, and it has been under constant threat of patent infringement or some other action by Microsoft. We have a little bit of an issue with that, too. If you spent a number of resources and a certain amount of time developing a product, you really don’t want to give that away if it means you put all your investment at risk. We’ve always been proactive about sharing fixes to existing open source that we’ve used or that we’ve looked at. We’ve been very active in the NFS (network file system) community, helping the Linux guys develop an enterprise-class NFS client.
By the same token, our operating system evolved out of an interrupt service routine, and there’s not really a lot of incentive for us to put that in open source because it’s really not applicable to people who aren’t building storage appliances.
GN-N Which would be your competitors, of course.
CM Right. I’m sure EMC would love to browse through it.
GN-N As with all purpose-built devices, at Network Appliance you are trying for the whole five 9’s thing, or 99.999% up time. Things have to be up all the time. You don’t want to have to admin the machine like you admin a traditional BSD or Linux box. How much work do you wind up having to do to off-the-shelf software to make that happen?
CM We do a tremendous amount of work, and I’d say one of our core values from day one has been our simplicity. It’s a fairly complicated thing internally to have a system come up and figure out which disk drives are attached and construct RAID groups out of those drives and then present them. But all of that complexity has been encapsulated in a very simple, easy-to-understand way for our customers, who can talk to our boxes in terms of volumes and protocols.
So when you get a NetApp box and you turn it on, you can just say, “Vol create,” and you’ll get a volume. By default it will use up all the disks that are attached to it, and it will save the spare—it will do best practices. That’s been one of our key benefits. A customer buys our equipment and sets it up. A couple of hours later, it’s up and running—which is not the traditional mode.
When we entered the market, the traditional SAN vendor, which would have been EMC at the time, had a lot of knobs exposed, and so you didn’t set it up; you had a team of people come in and set it up.
That kind of appliance model has been very straight. Now to be perfectly honest, that model has been taken to almost an extreme by Linksys. Linksys has a little network-attached storage box that uses USB drives. In this case, you plug it in and go to a Web browser portal and say, “Go.” It doesn’t get much simpler than that.
That’s a level of simplification—although the question I thought you were going to ask was how do we get five nines or six nines out of basically the same kind of hardware that you reboot every day at home? The answer is, it’s not quite the same hardware. We use the server chipsets and what Intel and AMD and the hardware people think of as being the server-class gear, which is not their very inexpensive value-engineered gear. They differ in the amount of error checking they do internally and often on how many PCI buses you can configure on them. Also, we have a tremendous amount of investment in what I would call software data protection—protecting customer data with check codes within the system to verify that it has not been damaged or modified in any way as it goes from the disks to the network port.
GN-N Related to that is this goal of zero administration. What were the challenges in getting something like that to work with a mix of off-the-shelf and in-house software?
CM We met that challenge pretty much head on by not using any off-the-shelf software for that aspect of our systems. We developed our management systems completely from scratch in-house, partially because there really isn’t a Hewlett-Packard OpenView equivalent. Everyone plugs a network management code into HP OpenView because it had an open model for plug-ins, but there really isn’t any kind of industry-standard open storage management effort.
There are some things going on, however. We participate in a group called SNIA (Storage Networking Industry Association). We provide a set of APIs called Manage ONTAP, so that our customers—especially our large customers who are very much interested in building automation scripts and things like that—have the ability to control any of the knobs they want to control on our filers from their scripts.
We’ve found that companies bifurcate at a certain point. At first they just have system administrators, but then they get to a certain size and they split into storage administrators and data administrators. A storage administrator concentrates on what kind of storage hardware is spinning in the data center, how it can be fixed, how to diagnose its health, and how to grow it.
The data administrator is really an adjunct to the DBA. He’s the guy or department admin who says, “We need another six terabytes of home directory space,” or “This database is going to be 600 gigabytes.” He’s also the interface between the storage administrator, who has responsibility for the assets on the data-center floor, and the end user, whoever that might be.
We’ve spent a lot of time providing tools for both groups of people. But I would say that we spend a significant amount of energy doing nothing but storage management. We have an entire group, headed by director Louis Selincourt, just doing storage management.
The other aspect of storage management, which has really changed the world since the time we started, is that it used to be that 100 terabytes was a huge amount of storage to have deployed, yet today we have customers with 20 and 30 petabytes of storage. When you get that kind of storage deployed, it’s almost unfathomable for an individual to understand where all of that information might be or who has it or who needs it. That is why these tools have become an essential element of the storage manager’s ability to manage all that storage.
Instead of managing storage by saying, “I want these six spindles associated with this RAID group for this guy to have a volume,” you say, “I want really fast spindles.” The DFM (Data Fabric Manager), which is our storage application, allows you to specify that you want a volume that’s really fast, and then it will know which of the available filers have really fast disks on them.
That’s been driven primarily by our large customers who have large amounts of storage. They like to call it policy-based management, as opposed to individual physical-based management, because no storage administrator really cares which of the 27 filers you get the storage on, but the administrator knows that these 27 are allocated for this kind of storage. Does that make sense?
GN-N Yes, it does make sense, although I want to go one level deeper. Under the hood, was the whole operating system layer done by you? Or are you guys using an actual common Linux or BSD operating system?
CM No, actually Data ONTAP is a completely in-house operating system built from scratch. The funny thing is, I like to say it evolved from an interrupt service routine because when NetApp first started, filers were pretty simple; all they did was sit around and wait for packets to come in from the network port. When they got a packet, they woke up and did whatever that packet wanted them to do, and then they would go back to sleep. That’s really the function of an interrupt service routine.
As the features and capabilities of the filer grew, we determined that we needed to have the ability to do other things at the same time, like DNS. One of the things that our filers do these days is run a network time client so that we can keep the time and the filers synchronized very closely with a clock. People who are doing software development, for example, want their timestamps to be very accurate.
So those are the kinds of things that evolved, and Data ONTAP has grown with that. The entire kernel is around 50,000 lines of code, so it’s pretty small. That being said, as we evolve into this storage-grid architecture going forward—and I’m sure you’ve heard a lot about that in the popular press—there are opportunities for us to use open source systems to provide a host environment for user-level tools. As resources on our processors become available to do that, I think you’ll see more of that, both from us and from others.
GN-N Among those who have built appliances starting from something like FreeBSD or Linux, there’s always the question, “What did you have to remove to make it work?” You can’t, I suspect, run a filer with syslog.
CM At FreeGate we actually started with Linux, but Linux frothed too much. There were so many developers, and it was changing so frequently, that it was hard for us to get to a stable position. So we switched over to FreeBSD 2.0 and 2.1, and it worked out really well. We stabilized it. We removed all the stuff we didn’t need, and we had a pretty stripped-down version of the operating system that did just what we wanted it to do. We determined, however, that it was impossible for us to maintain currency with the FreeBSD community.
They could fix bugs and add features faster than we as a small startup could integrate them and make sure that all of our stuff still worked.
So once we started shipping our product, the FreeGate Gateway, we pretty much had taken an amalgam of 2.1 through 2.8, and that was our kernel of choice, and it really wasn’t changing. We weren’t adding new processors, we weren’t adding anything else new, so it really didn’t need to change. When FreeBSD was going off to versions 3 and 4, we were still shipping 2.75, or whatever, and that was just fine for us.
GN-N I think that’s a very common model. I know in my experience in embedded systems, people will build a box and just never change anything. If they’re going to upgrade, they change everything.
CM And that’s the clients, right?
GN-N Yes, exactly.
CM It’s not a server. I often remind people that we’re not a server company; we’re an appliance company.
GN-N When you say that, what do you mean?
CM A computer server is designed around the ability to run programs that have a number of resources that are provided by the kernel and the libraries, which the programs can call, but the programs that run are determined by the owner.
An appliance actually does one thing or one class of things, and the multiprocessingness of it, if you will—the number of threads—are really related to how many instances of that one thing it’s doing.
So when talking about Solaris, for example, you might size your Solaris purchase by the number of users you can support or perhaps the largest application you want those users to run. There’s a lot of stuff in Solaris that is about providing services for different kinds of applications, and every time you log onto Solaris, you get your own little environment that is pretty much ready to do whatever you want it to do.
You have a directory of applications and you type Go Oracle, or whatever, and it will start loading that application and become Oracle for you, whereas in a storage appliance (or any appliance), it does only one thing. In our case, it does storage, and when you “log onto the box,” you don’t really log onto it. What you do is request a service from it. You request storage from it. You want to mount a home directory, or you want to open up a LUN.
So, you really have multiple paths or multiple instances of people talking to LUNs or talking to home directories, but you don’t have them doing different things on the appliance. In a server operating system, everything happens in user land because people log on and become little instances, but in an appliance operating system everything happens in kernel land. There are no user processes per se. They’re all just service spreads. It’s just a different model.
GN-N That brings up an interesting challenge. You might start adding complex services from outside, such as your network time client, or CIFS (common Internet file system), or perhaps other subsystems that you need to make the appliance work but that wouldn’t normally be thought of as part of an appliance. In a server system, that would be an application. You would run the NTP daemon. You would run the CIFS daemon. How do you address that in an appliance and make sure it doesn’t interfere with anything else?
CM To be fast and efficient, we generally have a single-name space, so you can pass pointers around and they work everywhere in the system. The good news is that if you have a Posix-compatible API, or a Unix libc interface, it really doesn’t care if it’s running a user process or if it’s running as a lightweight thread in your appliance operating system.
The trick is that bugs in that code will cause problems on your system, but the good news is that because it’s an internal service that only our code is calling into, you can know exactly every single code path that service can go through. Because of that, you can go through and verify that the parts of the service you’re using are really going to work.
GN-N I’d like to talk about security. Do you find any particular interesting challenges in providing a secure appliance? I guess that would be pretty high on your list.
CM It is. We’re always very cognizant of protecting data from not only errors in hardware but also malicious errors. I certainly don’t think of it as a first line of defense but it contributes. Because our source code is not available, there are people who are scanning every line of code to find out if there’s a buffer overflow issue, and sometimes there is. By the same token, by keeping the operating system and the actual firmware load targeted to exactly what it is that we’re trying to do, we can be pretty careful about the security of those applications.
The only way to talk to our filer is through a client port. You can talk to it only through a SCSI or Ethernet port when you’re doing NFS or CIFS, and we’ve implemented the front ends for all of those things. We’re very careful that our front ends don’t have any of the traditional exploits that you might find with inexperienced programmers, such as forgetting to check boundaries, but it’s impossible to run third-party code on our machines because that’s not what they do. They’re like ROMs. There isn’t an opportunity to create executable content that’s unexpected for us.
I think that’s the biggest weakness of most operating systems: If you have a way of putting executable content on an operating system that was not anticipated by the designer, you have an opportunity for something to happen. When I was in the Java group at Sun, part of my job was working on the security of Java. We had 12 patents. It’s really hard to create a trusted computing environment where you can guarantee that it’s impossible to subvert the security infrastructure that’s in place.
If you take a filer and pull out the CPU and then stick a logic analyzer on the bus, you’re going to be able to see what the CPU is doing, and I can’t protect against that.
You can imagine putting explosives inside it or something, but that’s a level of security that I am not willing to provide because my customers won’t buy it. The same thing is true with airplanes; if everybody who got on an airplane had to strip naked and wear a straitjacket, we could guarantee that none of the passengers could hijack the plane, and yet as a passenger service, no one would really go for it.
There’s a balance, and that’s where architecture comes into play. We have an advantage because we’re not a general-purpose execution engine. We can, through diligence on the code paths that we’re responsible for, ensure a much higher level of security than an off-the-shelf operating system where there is no predefined expectation of what that code path is going to be used for. I think that’s the standard of embedded systems in general.
GN-N Let me ask one more question for the Queue audience. These kinds of purpose-built systems definitely seem to be increasing. We see more and more of them out there. When people are going to work on these, what kinds of skills or systems should they be looking at?
CM The people we have the hardest time recruiting at NetApp, and would love to find more of, are the ones who really understand the boundary layer between the hardware and the software. They understand how the BIOS registers appear in the address space of a PC. They understand what properties an I/O card has to have for software to recognize it, to configure it, and use it in an efficient way while they maximize bus utilization and that kind of thing.
We can find application programmers. We can find code programmers. But finding people right on that fringe, between hardware and software, that seems to be the hardest.
Let me relate a short conversation I had with my father. When I was growing up, I loved computers. I thought they were really neat and I wanted to program them. When I was getting ready to go to college, I asked my dad what major I should have, and he asked me, “What are your choices?”
“Well, I can get a computer science degree, which will give me the credentials to program computers, or I can get an electrical engineering degree, which will give me the credentials to build them.”
“If you have a double E degree, will they still let you program them?” Dad asked.
“Yes. People who have art degrees can program—programming is a little bit different from engineering.”
“OK,” he said, “but if you have a software degree in computer science, do you think they’ll let you build them?”
“No,” I said, “they don’t want to trust that to people who have never actually built hardware.”
“So,” he said, “if you don’t know what you want to do, then you obviously have to get a hardware degree because that way you can do either.”
It turned out to be a very prescient sort of thought. My degree is in electrical engineering, and I grew up building computers and programming them at the same time. That mix of being able to go really deep in software, as well as really deep in hardware, has always served me well in finding the right balance in an embedded system, because the challenge that you face in embedded systems, unlike other systems, is this: What do you do in hardware and what do you do in software?
The stuff you do in software is stuff that you can change easily in the field, and it’s where you do value-added features; and the stuff you do in hardware is what you want to do all the time, and you want it to be fast and efficient. It can actually be less expensive to acquire it in hardware form than writing it in software. Understanding that balance has always been important for people who want to build embedded systems.
Originally published in Queue vol. 4, no. 3—
see this item in the ACM Digital Library
Jim Barton - TiVo-lution
The challenges of delivering a reliable, easy-to-use DVR service to the masses
Terry Coatta - The (not so) Hidden Computer
The growing complexity of purpose-built systems is making it difficult to conceal the computers within.