Interviews

Listen to an MP3 of this article   Download PDF version of this article PDF

A Conversation with Cory Doctorow and Hal Stern

Considering the open source approach

For years, the software industry has used open source, community-based methods of developing and improving software—in many cases offering products for free. Other industries, such as publishing and music, are just beginning to embrace more liberal approaches to copyright and intellectual property. This month Queue is delighted to have a representative from each of these camps join us for a discussion of what’s behind some of these trends, as well as hot-topic issues such as identity management, privacy, and trust.

From the software industry is Hal Stern, senior vice president of systems engineering at Sun Microsystems. During his 17 years at Sun, Stern has held a number of positions, including CTO of software, CTO of Sun services, and chief architect of Sun professional services. Stern was involved in the evangelism and business issues surrounding the open sourcing of Solaris and Sun’s GlassFish Web application stack. One of his main interests deriving from these efforts is looking at how open source software economics drives wide adoption and monetization and ultimately affects and modulates Sun’s strategies and products.

You can read about Stern’s other diverse interests on his blog (http://www.blogs.sun.com/stern). There you’ll likely find musings on Cory Doctorow, a science fiction writer of whom Stern is a big fan. Doctorow has published three novels and is co-editor of the popular blog Boing Boing. His latest collection of short stories, Overclocked: Stories from the Future Present, and all of his novels are available as free downloads from his Web site (http://www.craphound.com) under the Creative Commons License. Doctorow is known for his progressive views on both digital rights and free software, and last year completed a four-year stint as European affairs coordinator for the Electronic Frontier Foundation, a nonprofit organization committed to defending free speech, privacy, innovation, and consumer rights on the Internet. Doctorow and Stern recently sat down together during a conference Sun held in San Francisco and allowed us to tape their conversation and snap a few photos.

CORY DOCTOROW Sun is an open-standard, open-hardware, open-content kind of business. What’s driving Sun to open source its software and now its hardware as well?

HAL STERN I think that we have come through a social shift in our thinking about open source. We’ve been moving from an historical, almost fair-use kind of conversation, in which we ask, “What could you really use the content for, and what’s a fair way to excerpt it?” to asking, “If people have access to it and can look at it and understand it and then derive something from it, how might they create other opportunities?” And in particular, “How is it market-expanding?”

Take the simple example of Ubuntu Linux, which for a long time didn’t run on any SPARC hardware that we made. Therefore, anyone who liked Ubuntu Linux was by definition not a Sun SPARC customer. As soon as we got it up and running on our hardware, they could become SPARC hardware customers. So rather than being seen as cannibalizing other sales, it was expanding the market.

CD How has open source changed the process of creating standards?

HS We tend to get involved in the standards efforts that are developed almost the way software historically has been developed: a lot of people work on a draft, and then they fine-tune it a little bit, and eventually you get working implementations of the standard. It’s a very classic waterfall model of development.

If you look at the way that software is being developed now or that services are being deployed on a network, you will see that they’re evolving rapidly. You do something, it works, and then you fine-tune it a little bit, or people find a new way of using it. Then you evolve from there. The very small footprint standards are going to be the most useful, and they’re undoubtedly going to evolve much more in public and by people using them and trying to come up with practical applications. There’s a lot more applied engineering than with the theoretical, almost formal language design type of thing that would normally go into developing a standard.

I frequently go back to what Steve Deering (IP developer, formerly of Xerox PARC, now at Cisco Systems) said in an interview years ago about TCP/IP. Asked why IP didn’t include features that, at the time, seemed obvious, Deering replied that IP is the thin waistline of the protocol stack. There’s a lot of innovation below in terms of all the different types of networks that can carry IP packets, and a lot of innovation above in terms of all the protocols to be built on it, but the fact is, IP was very simple in its design and it facilitated this innovation on either side of it. I think that’s the kind of standard of development we want to get to: What are going to be those simple things that facilitate innovation above and below? Clearly they have to be accessible and open to everybody to get that kind of innovation. Then we have to start looking at innovation on the physical side of things: How can we drive the costs down? How can we drive the complexity out, thus making it more broadly accessible—both economically and in terms of usability?

CD In Bruce Sterling’s last fiction book, Shaping Things, he talked about a future in which we would get telemetry off of devices, not while in use, but at the end of their life cycles. We might analyze the world’s landfills to figure out whose stuff ended up in them. Sterling talked about whether or not devices could ever be designed so that they gracefully decompose into parts that could be folded back into a manufacturing stream. Do you think there’s a future for that?

HS We’re talking about decoupling data from the device that held it while useful, and about changing the time-space trade-offs around that data to improve both privacy and long-term sustainability. The problem with realtime telemetry from devices is that it creates a privacy issue. You can see what someone is doing or has done in a recent time window; or if you are aggregating many telemetry streams together, you learn about what works and what doesn’t—in effect, what are people doing with the devices producing those telemetry streams. The risk is that if you have enough sources of data that can be combined and searched online, clever searches can begin to infer things that you might prefer would stay private. Taking telemetry from devices at the end of their lives lets us get some of the use-case information about what works and what doesn’t work, or what kind of unintended user consequences we introduced—without the possibility of accidentally disclosing private information in realtime, when it would have the most impact.

The recycling side looks at how we deal with the devices over long periods of time. One of Sun’s goals is to get our systems to the point where less than 10 percent of the system would end up in a landfill; everything else gets recycled. If we encourage that, and make it easier to recycle consumer devices, then the end-of-life telemetry becomes more useful and we have less heartburn over a shorter cycle of going through our personal devices. The idea of looking at the long-term ecological impact of our products isn’t limited to our industry. Pharmaceutical companies look at the by-products of manufacturing and metabolism, so that if you take a drug to relieve one symptom, you don’t end up polluting the environment for everyone else when your body flushes the by-products.

CD A question about hardware design that’s very fraught these days is whether hardware can be used to keep secrets, and from whom it can keep secrets. There are times when we would love for our hardware to keep secrets. Microsoft BitLocker is a good example. It provides hardware-level encryption on your hard drive, so that if someone steals your laptop, the thief can’t recover your files. We all want that, but at the same time, I think many of us are skeptical of something like an Xbox that uses hardware locking to keep secrets from you. It keeps you from booting the Xbox with your chosen operating system.

HS As we start to look at the issues of identity and security and privacy, we also come up with trust. What is the purpose of actually keeping a secret? It’s so you can either control the flow of information where there is no trust or validate information where there’s imperfect trust or less-than-ideal secrecy or less-than-ideal security. You start to build up a model of what particular threats you’re worried about and how those threats represent themselves, and then you can ask, “Well, where is it that I need to go and enforce protection?” Is it keeping things on my laptop that are unencrypted, or is it that if I just keep everything in a network file storage mechanism somewhere, that’s as safe as keeping my money at the bank and just using the ATM for cash and cache, in both homophonic interpretations of the word.

I worry about accidentally divorcing people from their content. In the short-term there are things like theft, or losing your laptop with your book on it. But over longer periods of time, we have to worry about the encoding of the data. Do we actually know how to interpret that five, 10, 50 years from now? I don’t think we have that much experience with it. I think my mom probably asks me at least every six months when she can throw away the paper tape that’s in my old bedroom. It’s a very retro technology placeholder of 25 years ago, but that was the preferred storage and transfer mechanism, lacking anything else, and my Radio Shack TRS-80 with the cassette tape backup was a big improvement over that. It’s hard data, but where are you going to find a KSR 33 Teletype with a paper-tape reader on it? In some museum somewhere...

CD I thought I knew the answer to this conundrum. Maybe you can tell me if I’m being naive here, but every computer I’ve bought has had about twice as much hard-drive storage as the previous one, and so basically I could pour everything I had from the old computer into the new computer, and still have that much space again. I never had to worry about physical access to the bits. You’re right, paper tape is a special case, but we’re beyond the realm of paper tape now. Generally speaking, it’s all online storage.

I’ve always been able to acquire emulators for every previous generation of hardware. Now I worry that trust and trusted computing, which is supposed to distinguish between emulated and physical hardware, might subvert this. I can emulate your PDP-11—I can emulate your PDP-11 on my watch! We can emulate a C-64 in the chip that’s in the joystick that used to come with the C-64 as a peripheral. So why isn’t that the answer to this?

HS From a hardware perspective, absolutely it is. This is Moore’s law making every previous generation of hardware available through emulation. It’s not a question of the devices, but rather the software and the data that run on those devices.

Then the question is from a security and an encryption perspective: Can you run the particular software on whatever it is you’re emulating, and do you have permission to run that software? Is it possible for you to buy a particular copy of the application that you need to read back an old file format, and if the answer is no, could you write something to read it? What if it’s encrypted or content protected? Even if the encryption was done not necessarily in a cryptographically well-engineered way but rather in a content-protection vein—not so much that we’re going to try to force secrecy here, but, rather, we’re just going to try to make sure that no one else sees the content—the mechanism is protected under the DMCA (Digital Millennium Copyright Act).

There are all kinds of interpretations of the DMCA that say you can’t write a decoder because you would be violating federal law in trying to read back your own content, and that to me is this digital divorce in which you end up putting something through the trapdoor and you’re on the other side saying, “I’ve now signed up to a lifetime of software requirements making sure I can always read it back.”

Where we’re going to have standards and interoperability is going to be in making sure that we always have access to that content. I think this was perhaps missed in the Commonwealth of Massachusetts decision almost two years ago in which they were going to use only office productivity software that read and wrote open standards, and that they weren’t making decisions about any particular vendor. Rather, they were trying to make sure that they would always be able to get access, in some nice XML or ASCII-readable implementation of the bits, and that the bits would be useful. You want to make sure that you can get at what the intent was some number of long years later.

CD Natalie Jeremijenko at UC San Diego calls that “legible computing.”

HS I like that. I think there is a whole set of social aspects of computing that we’re just starting to worry about. My first experience with computers was when I ended up with a lot of free time in high school. At that point, computing was simply about what you could and could not do. The social implication of the technology then was writing an emulator for the Basic interpreter environment so that we could have it print smart-aleck responses when unsuspecting, less nerdy students sat down at the Teletype while we giggled from around the corner. Twenty years ago we were more concerned with what was computable and what wasn’t—and we ended up investing in computing optimizations. Researchers at Bell Labs came up with the 2-opt algorithm to approximate “good enough” solutions for the TSP (traveling salesman problem) because we just needed to be able to solve this broad class of problem. Nobody worried about any kind of computing by-product.

I think the biggest social concern of computing today is around the element of privacy and trust: I’m going to be representing myself and everything I’ve put into it is going to get archived somewhere. Today you write a blog entry that is picked up by your favorite feeds, and you decide later you want to recant that entry. Once it has been said and fed elsewhere, however, you can’t. It’s there.

CD I’ve heard of these big breaches—they have been called privacy Valdez or data Valdez—and those of us who were early on the Internet lived through one, with the advent of Deja News (the archive of messages written to Usenet). Usenet, which we thought was written on water, had people on it who later turned out to be CEOs of billion-dollar companies talking about how deep they went in the k-hole last weekend while at the Dead show.

Suddenly all that stuff was searchable and indexed to people’s actual names! I think a lot of us worry about a future data Valdez, such as a good facial recognition algorithm that might be able to figure out every photo of you on the Internet, even if it doesn’t have any title or surrounding context that identifies it as you.

HS There’s the issue of how you identify yourself in public—in meatspace public and in electronic public. I think we’ve gone through a narrowing phase where we want to get more and more centralized control, certainly with emphasis on identity management and access controls, mostly coming from the enterprise computing space. We have single corporate systems of record, and then we figure out what users can do, permission-wise, from that centralized book of roles and responsibilities. When accessing my company’s records, and certainly in our current mood and mode of regulation, it’s required. But that’s my work life, which is a small portion of what I do online.

I also go to eBay, my bank, and various organizations where I volunteer, and I would like to disclose some parts or elements of things to each of those organizations, but there’s no reason for them all to see each other and be able to aggregate.

The main question of this issue of identity and trust is, where are your bits going and who has them? You’re likely to have many identities online, and people tend to think of that as maybe a bad thing. I tend to think of that as maybe a good thing because I’m in a number of different circles that intersect only in very small areas.

I can look at the notion of having my identity highly distributed, almost like a problem of a RAID array: if it’s distributed and I know where all the pieces are, I can detect whether it has been hacked or compromised or even used in a way it wasn’t meant to be used. That puts some onus on me to keep track of all that, but at the same time, it means that not everybody has to have perfect information about me. Yet even if the data is compromised, so what? So you learn my street address. A lot of people know that; it’s public information, but other information you might need if you wanted to commit financial fraud against me or even to commit a denial-of-service attack on me might be hard to get.

CD Unless I park at the end of your driveway.

HS In that case, you’re always welcome to come inside for a beer. But I think that part of the concern now is not just that all the data is there, but that with all of it being accessible through Google or your favorite search engine, you can now start drawing interesting inferences and joining sources of data that were very, very hard to get before. Access to a document entered in a court case historically would have been limited to the occupants of a closed room: lawyers, the judge, and perhaps a witness who actually saw the relevant piece of evidence. If you were not in that closed room, you have no idea what was there. Now evidence shows up in an online court proceeding, so there’s a massive disclosure of things that were only interesting in that private realm, and you’ve expanded the visibility of that data. Anybody who can use Google can start to cross-reference things that used to be kept physically private.

CD We had an interesting incident with Boing Boing in which a company that makes a Web filter, a censorware program, classed us as pornography. The company offered us a deal: they would unlock us if we would rearrange the site in such a way that would make it easier for them to censor a few posts. We basically told them to get bent.

Then one of our irate readers googled the name of the guy we had been dealing with and discovered that this guy was a diaper fetishist in his spare time. The thing that was striking about this wasn’t that he was weird, because we’re all weird in our own way; it was that he wanted to use his power to deny other people the capability of being just as weird as he was. Being a diaper fetishist was a hard thing to do before the Internet came along and made it possible to find alt.fetish.diaper.

HS There’s good news, bad news, and worse news here. The good news is clearly when you talk about being able to find an online community, whatever your degree of weirdness, you can find it. You can also find weirdness that we want to aggregate. I think 10 years ago the number of drug companies interested in treating Asperger’s Syndrome was near zero. Then Wired did an article about how Asperger’s and autism are on the increase in places like northern California and the fact that you can go online and find communities with hundreds of thousands of people in them, thus making it an interesting market. There’s a power of aggregation that has a socially good aspect to it.

On the flip side, as soon as you want to control where the aggregation occurs, or in some cases you want to control how people are going to find these points of aggregation, you’re taking away people’s ability to make decisions about what they like and don’t like. I had a remarkably wise junior high school music teacher who said that there’s good music and bad music, and there’s music you like and music that you don’t like. They’re not the same thing. You can like bad music, and there’s plenty of good music that you may not like, or you may not like yet. Don’t be confused by that.

So when you start to talk about filtering in the sense of Internet safety and security, you start to look at who is making those decisions rather than whether you are taking personal responsibility for deciding what you’re going to like and not like.

I worry that we’re losing that ability, and certainly as a parent, I worry about how we convey that individual responsibility about what you do and do not do online. For example, certain kinds of content sharing are good because you’re distributing content. Other kinds of content sharing are bad because you are denying the artist who is making your music or writing your books their compensation. Conveying the moral sense of right and wrong, the sense of individual responsibility, is a lot harder than saying, “Don’t steal candy from the 7-11.”

CD It seems to me that one of the big problems with the filters you’ve just identified is who gets to set policy in the machine. As a science fiction writer, I am offended by sci-fi movies where it turns out that the rocket ship has a self-destruct button, it has been pressed by accident, and now the whole thing is going to explode. I always wonder, wouldn’t it be simpler not to put a self-destruct button in a rocket ship? Isn’t it just better engineering not to have a mode where the rocket ship blows itself up?

By the same token, I often wonder whether trusted computing architectures that allow remote parties to enforce policy on your hardware are a good idea. Although we can imagine beneficent examples of this, this is what spyware is, by definition, right? Spyware is remote parties setting policies on your computer against your wishes. Is it ever a good idea? Will it ever be a good idea? Can we think of a trusted computing architecture that improves your security on the Internet, or is this always going to be installing a self-destruct button?

HS There are two parts to that answer. There’s a question of whether I have some certainty about the software I am running and whether I can validate that it’s running on a hardware device that I believe is correct, in the sense that its components have been validated. I need to know at some point that I’m going to get the result that I expect out of this software—as opposed to it being compromised or including a Trojan or that I’ve executed the wrong path to it—and that I now have a reasonable degree of trust in what I’m about to do.

The question after that is, so now what am I going to do, and is there anybody inflicting policy on that? I’d like to know that my software and my system are operating correctly. What comes next, I think, is a very hard thing to inflict policy on. I like your example of the self-destruct button on the rocket ship. Is it that useful when you’re billions of miles and light-years from home? If you expect ever to get back, probably not.

What was the point of putting it in there? Well, if at some point you determine that it’s not operating correctly, sometimes the right thing to do is to stop it from operating. To quote Sun’s David Yen, “When you realize that you’re driving drunk, the best thing you can do is stop driving.” What happens next is a set of other policy directives. Have you actually solved the problem by pulling off the road and taking a nap or calling a cab? In that case, it’s a good thing. Pressing the self-destruct button on the car will also solve the problem but introduces a host of other liability and social issues.

I think it comes down to this: What is it we are trying to control? What is it we are trying to regulate or in some cases inspect? As you’ve said before, if you’re looking for a needle in a haystack, sometimes the answer is just to get a bigger haystack and to improve our policing. To put this in the context of digital content, you can use all the restrictive DRM (digital rights management) you want. As long as human beings continue to have analog inputs, it’s hard to control every possible way in which content is going to be reconverted into digital form. There’s a certain amount of diminishing returns there.

After a while, you say, how much are we spending to control content? I would much rather find a bigger haystack. Let’s look for the people who are clearly engaging in things that look like piracy or clearly doing things that look like invalid uses or redistribution of the content. You want to find out where your content piracy is happening.

Which is the one that you’re going to try to regulate? After you watch a DVD you bought off of some guy’s blanket on a New York City street—and you realize that you’re watching a copy of the movie made by someone with a handheld camcorder, with a guy behind him asking for more popcorn and coughing—in some cases, if you liked it, you might go out and buy the real one. That’s market-expanding, even if the original content was pirated. There’s a reason that bands such as the Grateful Dead and Phish encouraged bootlegs: because they reach more listeners.

acmqueue

Originally published in Queue vol. 5, no. 3
Comment on this article in the ACM Digital Library





More related articles:

Amanda Casari, Julia Ferraioli, Juniper Lovato - Beyond the Repository
Much of the existing research about open source elects to study software repositories instead of ecosystems. An open source repository most often refers to the artifacts recorded in a version control system and occasionally includes interactions around the repository itself. An open source ecosystem refers to a collection of repositories, the community, their interactions, incentives, behavioral norms, and culture. The decentralized nature of open source makes holistic analysis of the ecosystem an arduous task, with communities and identities intersecting in organic and evolving ways. Despite these complexities, the increased scrutiny on software security and supply chains makes it of the utmost importance to take an ecosystem-based approach when performing research about open source.


Guenever Aldrich, Danny Tsang, Jason McKenney - Three-part Harmony for Program Managers Who Just Don't Get It, Yet
This article examines three tools in the system acquisitions toolbox that can work to expedite development and procurement while mitigating programmatic risk: OSS, open standards, and the Agile/Scrum software development processes are all powerful additions to the DoD acquisition program management toolbox.


Jessie Frazelle - Open-source Firmware
Open-source firmware can help bring computing to a more secure place by making the actions of firmware more visible and less likely to do harm. This article’s goal is to make readers feel empowered to demand more from vendors who can help drive this change.


Marshall Kirk McKusick, George V. Neville-Neil - Thread Scheduling in FreeBSD 5.2
A busy system makes thousands of scheduling decisions per second, so the speed with which scheduling decisions are made is critical to the performance of the system as a whole. This article - excerpted from the forthcoming book, “The Design and Implementation of the FreeBSD Operating System“ - uses the example of the open source FreeBSD system to help us understand thread scheduling. The original FreeBSD scheduler was designed in the 1980s for large uniprocessor systems. Although it continues to work well in that environment today, the new ULE scheduler was designed specifically to optimize multiprocessor and multithread environments. This article first studies the original FreeBSD scheduler, then describes the new ULE scheduler.





© ACM, Inc. All Rights Reserved.