jagged thoughts

Hot computer systems observations and analyses from John Linwood Griffin

November 13, 2009

CCS 2009

16th Conference on Computer and Communication Security (CCS’09)
Chicago, Illinois
November 9-13, 2009

CCS is one of the top international security conferences (example topics: detecting kernel rootkits, RFID, privacy and anonymization networks, botnets, cryptography).  It is held annually in November.  This year there were 315 submitted papers from 31 countries, of which 18% were accepted after peer review.

I’ve attended CCS twice (2006 and 2009).  It is one of the best conferences I’ve ever attended — I find that the speakers describe practical, cutting edge, informative results; I keep up with old acquaintances and meet new ones; I keep sharp and up-to-date as a research scientist.

Here are some of the major themes from this year:

* ASCII-compliant shellcode:  My favorite paper of the conference is “English Shellcode” where the authors developed a tool that takes malicious software as input and converts it into REAL ENGLISH PHRASES (taken from Wikipedia and Project Gutenberg) that execute natively on 32-bit x86.  If you read no other paper this year, you simply must read this paper, it is wack incredulous.  There was another paper that uses only valid ASCII characters for shellcode on the ARM architecture.  These demonstrations are important because ASCII (and especially English ASCII) is likely to be passed through by network intrusion detection systems.  The favorite paper is here:
http://www.cs.jhu.edu/~sam/ccs243-mason.pdf

* Cloud computing:  Few authors of cloud-related papers seemed to address the cloudiness of their work, instead (and disappointingly) discussing generic distributed computing principles under a cloud umbrella.  The best cloud talk I saw was Ian Foster, an invited speaker at the cloud security workshop, who described the transition from grid computing to cloud computing thus: grid was about federation, cloud is about infrastructure and hosting.  He pointed out that the grid folks did a good job of developing (e.g., medical research) applications and executing analyses, but that it is the advent of data distribution and sharing in the cloud that is a game-changer in cloud computing.

* Anonymous communication:  There were several talks analyzing the efficacy of anonymization networks (mix networks, remailers, Tor, onion routing).  My takeaway is that these techniques work very well for latency-insensitive traffic (such as email), only moderately well for latency-sensitive traffic (such as web browsing), and not very well yet for high-bandwidth traffic (such as VoIP).  My favorite work was a poster on “Preventing SSL Traffic Analysis with Realistic Cover Traffic” (Nabil Schear and Nikita Borisov) where the authors change the statistical profile of your encrypted traffic such that existing analyses (such as measuring keystroke latencies) are impossible.

* Off-client emulation:  Several speakers described a technique for client-server applications (such as game clients running on customers’ home computers) that help to ensure the correctness, robustness, or speed of the client application.  It’s impractical to run a complete copy of the client on the server (because one server handles many clients) so the authors generally create minimalist versions of the client (for example, a game client that contains no rendering code) that are server-efficient.  In the game example, the client would send the user’s commands (”turn left, walk forward”) to the server, where the minimalist client would verify that those commands didn’t result in an invalid state (such as walking through a wall) that would indicate cheating by the player.

* Function-call graphs:  These are well-known techniques for tracing how an application executes (create a graph of the control flow of an application).  The technique kept popping up during the conference: using them to identify when someone has violated your software license and included your source code in their application; using them inside a hypervisor to identify when a kernel rootkit is present in a virtual machine due to the different hypercalls).  One attendee I had lunch with was very critical of the function-call graph technique (using an argument I didn’t really follow) but otherwise the technique seems useful.

* Power grids:  The currently-hot topic in security research is power grids and smart meters.  There are at least projects at Penn State, Carnegie Mellon, Johns Hopkins, and I’m certain many other places.  There was a tutorial, a paper, and several posters all discussing security issues in the power grid.  The most interesting aspect to me was attacks against state estimators: the researchers described techniques to manipulate the system components involved in measuring and predicting the state of generators, transmission lines, etc.  However, the research community still suffers from a dearth of real-world information of how these networks operate and where the real vulnerabilities might be.

* RFID:  As we already know, it is possible to do RFID well but none of the actual deployed RFID implementations do it well.  One classic observation by a speaker was of the RFID-enabled drivers licenses issued in Washington State (in advance of the Winter Olympics) that include a KILL command that’s supposed to be set with a unique PIN but in reality is unset (using a default PIN)…meaning that anyone with a transmitter and sufficient power could kill a device.

* Ethical standards for security researchers:  One paper raised an ethical issue in its appendix (how can we do security research inside Amazon’s cloud computing infrastructure in a manner that doesn’t violate their terms of service?) and some researchers from the Stevens Institute have published a report and are organizing a workshop to investigate ethical standards for security researchers.  I didn’t really agree with many of the points made (my ethical line is drawn much further to the left: security researchers should have few constraints) but it was a hotly discussed and debated issue during the session breaks.

Wolfram Schulte at Microsoft Research gave an invited workshop talk on their Singularity OS project (reinventing the OS from scratch; using software-enforced isolation instead of relying on hardware memory management techniques).  It’s an interesting project but impractical since it would require a widescale by developers in such a way that very little development would happen for awhile.  The work was inspired by his team’s frustration on using best-practices formal verification (etc.) techniques for software development — or, taken another way, it was so frustrating when a blue-sky team tried to use existing techniques to develop and prove major software projects that they gave up.  That doesn’t bode well for using those techniques extensively in any real-world software development project (although they can still be very useful and insightful…just frustrating).

Also a shout-out to my student Brendan O’Connor for delivering a well-received talk on stock markets for reputation at the digital identity workshop.

Filed under: reviews — JLG @ 9:31 pm

November 14, 2008

Information Assurance Conference

In November 2008 I attended an “Information Assurance Conference” in Arlington, Virginia. This was a non-refereed two-day workshop of 30-minute talks on policy-level IA issues in the DoD and homeland security environments. The most interesting takeaways were:

  • If you are an organization that wants information assurance, give someone the high-level independent power to veto (or vet) which applications are allowed to use the network.

The U.S. Marine Corps has an outstanding example of this power used successfully: “the HQMC IA division will be the single point of contact within the marine corps for IA program, policy matters and oversight…[Mr. Ray Letteer] has authority to approve or disapprove an application or system for connection to [all Marine Corps core networks].” And, according to Ray (the speaker), the USMC really has given him the teeth to enforce his team’s IA policies.

Such a position of course requires diplomacy and tact: Ray mentions that he carefully vets the classifications of potential vulnerabilities to make sure only applications with demonstrable and unmitigatable vulnerabilities are ultimately banned from the network; he describes his role as translating geek-speak to the senior officers to convey the need for the restrictions his team enforces.

After a cursory look I feel that this USMC approach could serve as an best-practices reference model for many other large organizations. Another speaker noted that the traditional corporate and DoD approach is to have local administration (each division-sized entity has its own IA unit as part of its IT function), whereas the military is moving toward a single unifying enforcement point staffed by well-trained operators. (I asked “isn’t homogeneity terrifying?”; other speakers responded that homogeneity doesn’t have to mean single-point-of-failure — they are not talking about one point of deployment, they are talking about unified policy across all points of deployment.)

  • If you need an ROI (return on investment) story to sell an IA strategy to your management, you’re in luck.

Three speakers emphasized the availability of ROI metrics. Joe Jarzombek described the free software assurance tools that are available from the Department of Homeland Security. As part of that effort DHS published seven articles on making a business case for software assurance (sample title “A Common Sense Way to Make the Business Case for Software Assurance”; click on the “Business Case” link at the above site) and recently held a workshop on the topic.

Two other speakers suggested taking a nonstandard approach in selling security investments to your upper management: instead of justifying your existence, focus on demonstrating your continued competence. For example, present graphical weekly metrics of how many port scans you thwarted or how many new security vulnerabilities were announced by antivirus companies that you prevented from affecting your network.

Or, pick some of the low-hanging fruit to impress the bosses: Dr. Eric Cole of Lockheed-Martin mentioned a client engagement where his team was asked to suggest architectural changes to a network that was operating at 99% utilization. After looking at the network traffic, his team simply blocked 74% of the outgoing connections (i.e., those connections which could not be traced to a business purpose). Nobody complained, and the utilization was reduced to 55% at no cost to the customer.

  • If you are not a member of senior management, you need to learn to speak the language of senior management.

This theme came up over and over during the workshop. “Speak the language of executives — translate your geek-speak into business objectives!” All I can say is: I agree.

Four other quick notes from the workshop:

Whitelisting: One speaker mentioned a trend toward whitelisting web sites as a means of IA in military computer networks. (Whitelisting is enumerating the list of acceptable sites and denying access to any other sites.) I hadn’t heard that before — can anyone confirm you’re seeing this?

COTS: Is COTS still on the rise? Some speakers and attendees noted a trend toward COTS software and hardware, chiefly for the purchase costs and especially the (comparatively low) maintenance costs. Others noted that there remain many applications, especially in classified domains, where commercial vendors are unwilling to tweak their product to fit the needs of the space, and/or there is too much inertia or turf-war to switch away from specialized development systems.

Metadata: I was delighted to see a talk about metadata by Carol Farrant, whose team is interested in collecting, analyzing, and using metadata in data management for the intelligence and military communities. Of the technologies I heard discussed during the workshop, this is the one whose core technologies are arguably the least developed in the research and commercial environments. Unfortunately her team is underfunded and understaffed, so she is actively seeking volunteers to help move things along. (She notes that in the past year she’s seen more volunteer interest on the topic than on anything else in her career.) This might be an opportunity for an academic to have a big influence on metadata use and tool development.

FPGAs: I’ve been a fan of programmable logic since working with FPGAs in Dr. Richard Chapman’s research lab at Auburn. The final speaker of the workshop, Jonathan Ellis, claimed that the moment is at hand for reconfigurable logic to be used the way it was always intended — specifically, actually reprogramming the chips (frequently) during normal operations. Vendors are currently working to make this possible (if I heard correctly: although the chips can support multiple independent execution units on them, they currently have to be completely wiped to be reprogrammed. Not for long.) FPGAs have come a long way in 10 years: he asserts that software toolkits for ease of programming and implementation — arguably the biggest barrier to their widespread use — are right around the corner.  He also noted that the current thinking is if you are building 100,000 or fewer units of something like cell phones, it’s more cost-effective and time-efficient to pump out FPGAs (instantly available and upgradable) than to send off for ASIC fabrication (expensive, two month lead time).

I thank the hosts of this event, Technology Training Corporation, for sending me a complementary pass to attend the workshop. (This workshop was similar to the “cyber security conference” I attended in June.) Overall I would likely not attend this workshop again, as I (as a practitioner of basic and advanced research) am not really in their target audience. People who I think would be interested are people involved in policy-level marketing and sales for large government contractors, Marc Krull, and government employees involved with large program development and management.

Filed under: reviews — JLG @ 8:53 pm

October 9, 2008

NSRC industry day

This week I attended the 5th annual industry day at the Networking and Security Research Center (NSRC) at Penn State University. The event was similar in format to other industry days I’ve attended (CMU, Stony Brook) but with a more focused core of industry guests, primarily from telecom companies and large government contractors.

My main interest was in the work of professors Trent Jaeger and Patrick McDaniel of the Systems and Internet Infrastructure Security (SIIS) laboratory. Their students are working on several projects of interest to Jagged, including:

Another NSRC focus is on wireless networking research (cellular, sensor, 802.11, vehicular, you name it). An upside of their work is that it is strongly focused on real-world problems reported by companies — for example, CDMA2000-WiMAX internetworking. A related downside is that it wasn’t clear what academic (basic research) lessons could be drawn from some of the work; some of the results felt limited in scope and applicability to only a specific problem.

All the posters from the industry day are available here:
http://nsrc.cse.psu.edu/id08.html

The most interesting and controversial talk at the event was a keynote by Mr. Steven Chabinsky, the deputy director of the Joint Interagency Cyber Task Force. He advanced the idea that we as a nation have let ourselves be “seduced” by technology, by plowing ahead with deployments of untested and unreliable technology at critical infrastructure points without first fully understanding (or mitigating) the risks and consequences of failure. He called on us as researchers and companies to consider the full spectrum of threat, vulnerability, and consequence in our technological innovations. A lively discussion ensued after the talk regarding the economic incentives to deploy unreliable technology: several of the topics were:

  • Will better policy decisions be made when cyber risks are better understood? The speaker described a current lack of capabilities to quantify risk either as an absolute or a comparative measurement. This is especially true in low-risk but extremely-high-damage scenarios such as directed attacks against components of the power grid. I felt this observation makes an excellent point, and highlights a mental gap between the way that engineers think of technology and the way that decisionmakers compare among technologies. Perhaps the government should fund some new studies along these lines?
  • Where should the government draw the line between regulation and deregulation? There are several non-regulatory actions the government could take to constructively assist companies in developing hardened products (say, that control water processing plants), such as making supplemental development grants available to companies whose technology will be used in critical infrastructure. On one hand, I feel that government should more actively oversee and regulate (and pay for) these kinds of technologies. But perhaps the problem is more complex than I realize — e.g., perhaps one gets a qualitatively better product through open-market competition than one would through contract specification and regulatory compliance. Anyone have an opinion on this?

Mr. Chabinsky’s point was underscored later in the day in a talk on the Ohio EVEREST voting study. Patrick McDaniel discussed how the Help America Vote Act effectively caused an insufficiently-tested prototype technology (electronic voting machines) for a low-profit-margin customer (the government) to be thrust into mandatory and widespread use in a critical environment (the legitimacy of our democracy) in only a few years. He concluded (as concluded by Avi Rubin and others) that current systems are fundamentally flawed and unsecurable. In light of the above discussion, these fundamental flaws represent a failure of technologists (as well as many others) — both (a) in our inability to architect reliable systems and (b) in our inability to adequately inform public policy officials of the true readiness of proposed technologies.

This latter problem — coherently describing and conveying the capabilities and limitations of computer systems in a non-expert human-comprehensible manner — is one of the topics that has long interested me, especially in the context of information sharing in sensitive or classified environments. Anyone want to join us in working on this problem?

Filed under: reviews — JLG @ 10:12 pm

August 26, 2008

High end computing workshop

In August 2008 I attended the HEC FSIO workshop on file system and I/O (FSIO) research in support of high-end computing (HEC).

This HEC focus was interesting for a systems guy like me — think “systems that run detailed atmospheric simulations for weather prediction” and like environments where such words as “parallel”, “(peta)scale”, and “throughput” are bandied about. (Sample presentation title: Improving scalability in parallel file systems for high end computing.)

The primary attendees and presenters were academic PIs funded under a joint NSF/DOE program called HECURA. This program chooses a new theme each year for its solicitations: last year’s was compilers; this fall’s will be FSIO (as it was three years ago). All presentations from this workshop are available here:

The work was all interesting but old; most of the work had been presented and discussed at the great conferences of yore. What I ended up enjoying the most from this workshop was an “Industry Storage Device Research Panel” with two fabulous presentations:

The above two talks are a great introduction to, respectively, the future of magnetic storage & the future of alternatives to magnetic storage.

The most interesting thing I learned is DOE’s archival storage model. If you want to archive something, you FTP PUT it onto an enormous server containing everything else that’s been archived in the last 60 years. If you want to retrieve it, you FTP GET it. (I didn’t learn how you locate the item you want, but there must be a standard naming scheme or an index — if you know please send me a note.) I chatted briefly with Mark Gary, data storage group leader at LLNL, about the differences between that model and all the digital preservation issues we touched upon in the class I co-taught this Spring (metadata generation, textual normalization, ontology standardization, language translation, QoS, security, access methods, historical ingest, etc.) Mark made the point that their KISS approach, while limited in functionality at first glance, both works well and continues to do exactly what their users need.

Filed under: reviews — JLG @ 3:44 pm

July 9, 2008

Cyber security conference

In June 2008 I attended a “Cyber Security Conference” in Arlington, Virginia.  The format was two days of invited 35-minute presentations by big names in the government and government-contractor space.  I only attended day two so I missed half the discussion.  Here are some of the major themes from today’s twelve speakers:

  • Targeted phishing (a.k.a. “spear phishing” or “whaling”—can we as a community agree to stop coming up with terrible nouns like these?) was mentioned more often by more people than any other cyber security problem.  Targeted phishing is a social engineering attack where someone learns enough about you (or your work environment) to send you a custom-made email.  One example involved a newly-promoted CFO, where the evildoers read about the CFO’s promotion in a newspaper and wrote a letter from “HR” asking (successfully) for personal information, passwords, etc., in order to set up the new executive’s computer account.  Four of the speakers mentioned phishing as one of the top problems they are facing on corporate and government networks…
  • …which reminds me how two speakers complained that spending/effort on cyber security is not well-balanced among the actual risks.  Joshua Corman of IBM phrased it nicely by pointing out that cyber attacks merely for the sake of attacking (”prestige” attacks) ended in 2004; attacks since then appear to have been driven either by financial (”profit”) or, more recently, activist (”political”) motives.  The problem is that the bulk of cyber security efforts/dollars are going to thwart attackers that are easy to identify (worms, spam) leaving us exposed to more discreet attackers.  (Of course, nobody had a ready solution for how to identify and thwart these discreet attackers—a discrete problem.)
  • However, two speakers independently mentioned anomaly detection as an it-continues-to-be-promising approach to cyber security, while acknowledging that the false positive problem continues to plague real-world systems.  One of the core problems I’d like to see studied involves the characterization of real-world network traffic (especially in military environments).  Specifically, for how long after training does an anomaly detection model remain valid in an operational system: seconds? hours? weeks?

Two talks I really enjoyed were from Boeing and Lockheed-Martin, in which a speaker from each talked about the organization and internal defense strategy (applied cyber security?) of his corporate network.  I appreciate when companies are willing to share these kinds of operational details to make reseachers’ jobs easier: storage companies take note!  Unfortunately the talks were light on details but provided some interesting insight on email defense (#1: Outlook helpfully hides the domain name, aiding a phisher’s task, so write filters to block addresses like “jaggedtechno1ogy.com” at the corporate mail server; #2: many spams or phishing attacks come from newly-created domains, so write filters for this too—I’ve mentioned previously that we should perhaps tolerate some inconvenience for the sake of computer defense, and these are good examples of that).  Two questions I’d like someone to answer:

  1. How can we coax corporate network managers to be willing to evaluate active response systems (e.g., attack the attacker) on production networks?  It is probably much easier to do there (legally) than on government networks.
  2. When will corporate networks deploy the security support services (admission control, identity verification, key management) that allow application programmers to focus on their core competencies instead of being security experts?  C’mon, folks, it’s 2008.

UPDATE:

Three people have mentioned that question #1 is unlikely to have an answer:

What are the corresponding real-world analogies?  When is it legal for me, personally, to respond to a physical threat?  Only when there is serious threat of harm to myself or someone else (or, in some states, my property). Otherwise, call the policy (or the military). I doubt cyber-society will act much different. But, this does beg the question of where are the cyberpolicy and cyberDoD!

And everyone agrees that question #2 needs to happen, like, yesterday:

I think that the best answer as to why it hasn’t happened is related to cost. And, in this case, cost is directly related to usability for the sysadmins. If they can do username / password and be done with it, then they will. And they will only move to other measures if/when they are required to (e.g., corporate policy, liability concerns, etc). However, if one could find a way to overlay this security goodness onto an existing network in a way that is no harder (and perhaps even easier) than username / passwords, then they might want to do it. Esp if this overlay then allowed for a tangible benefit in terms of increased security of everything else.

Thanks, Greg and Bryan.

Filed under: reviews — JLG @ 11:48 pm

PDL visit day

In May 2008 I attended the PDL Spring Industry Visit Day in Pittsburgh, a workshop of sorts where students display their work in poster and demo form, industry visitors catch up with their old storage acquaintances, and everybody gets together for German food and beer afterward. (What’s not to like?)

Here are some of the larger tidbits I took away from the event:

1. Filesystems statistics survey

Garth Gibson organized a 5-year DoE institute, the Petascale Data Storage Institute, to explore issues of interest to folks like the national labs. A nifty thing they’re doing is putting together public repositories of useful data for storage researchers. For example, the Computer Failure Data Repository contains the data Garth and Bianca used for the MTTF FAST paper.

So, the latest one is the “filesystem statistics survey.” There is a tool that anyone can run and a respository for folks to upload their results. The type of results that they’ve generated so far are:

  • In archival file systems (at the national labs), most space is consumed by a small number of large files: 90% of space is consumed by files 32MB or greater in size, whereas 90% of files are smaller than 32MB.
  • In 75% of the archival file systems, 80%-90% of the files consume less than 2KB apiece.

This is available at:
http://www.pdsi-scidac.org/fsstats/index.html

2. Hadoop

I hadn’t heard about Hadoop before today (do I live under a rock? does everyone know what this is?) Hadoop is an open-source implementation of MapReduce — i.e., a toolset to help a user easily fire off map() and reduce() functions on his or her own cluster of heterogeneous boxes. An example from my favorite online encyclopedia: “The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 1.1 million finished PDFs in the space of 24 hours at a computation cost of just $240.”

So I guess distributed computing is just getting easier and easier. One of my colleagues was setting up a Condor cluster just as I was leaving CMU so I didn’t get to learn a lot about it or see it in action. If you have experience with Condor or Hadoop I’d appreciate your giving me an overview sometime.

My favorite Hadoop-related project was applying the “fingerpointing” techinque (from Priya Narasimhan and her students) to identify in real time which nodes are the source of performance slowdowns in a Hadoop-based system. Fingerpointing is their take on failure detection and root-cause analysis in distributed systems, described here:
http://www.ece.cmu.edu/~fingerpointing/

One of the topics I care about (related to #1) is using what auditable information has been collected about a system to actually do some useful auditing, which is why I’m interested in this particular work.

3. Home media storage

My favorite of the projects is “Perspective”, described here:
http://www.pdl.cmu.edu/HomeStorage/

They are looking at information stored in home media environments and asking questions about how real users want to interact with their storage: how easy is it to accomplish tasks such as “make sure a movie is on Randal’s ipod before he leaves for his upcoming trip” or “make sure this set of files in Zach’s JPEG archive can’t be viewed by anyone else in his household.”

User studies in computer science is an underdeveloped field. I got really interested in this after I saw some interesting work at IBM (the Sparcle project, linked below) that did a user study to see how well computer-literate people were able to specify access control policies. A lot of CS work suffers from a lack of user-centric design, so I’m happy to see any work that tries to address the problem. Sparcle is here: http://domino.research.ibm.com/comm/research_projects.nsf/pages/sparcle.index.html

Filed under: reviews — JLG @ 11:17 pm