<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>jagged thoughts</title>
	<atom:link href="http://jaggedtechnology.com/people/john.griffin/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://jaggedtechnology.com/people/john.griffin/blog</link>
	<description>Hot computer systems observations and analyses from John Linwood Griffin</description>
	<lastBuildDate>Sat, 14 Nov 2009 01:31:08 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.6</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>CCS 2009</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2009/11/13/ccs-2009/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2009/11/13/ccs-2009/#comments</comments>
		<pubDate>Sat, 14 Nov 2009 01:31:08 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=15</guid>
		<description><![CDATA[16th Conference on Computer and Communication Security (CCS&#8217;09)
Chicago, Illinois
November 9-13, 2009
CCS is one of the top international security conferences (example topics: detecting kernel rootkits, RFID, privacy and anonymization networks, botnets, cryptography).  It is held annually in November.  This year there were 315 submitted papers from 31 countries, of which 18% were accepted after peer review.
I&#8217;ve [...]]]></description>
			<content:encoded><![CDATA[<p>16th Conference on Computer and Communication Security (CCS&#8217;09)<br />
Chicago, Illinois<br />
November 9-13, 2009</p>
<p>CCS is one of the top international security conferences (example topics: detecting kernel rootkits, RFID, privacy and anonymization networks, botnets, cryptography).  It is held annually in November.  This year there were 315 submitted papers from 31 countries, of which 18% were accepted after peer review.</p>
<p>I&#8217;ve attended CCS twice (2006 and 2009).  It is one of the best conferences I&#8217;ve ever attended &#8212; I find that the speakers describe practical, cutting edge, informative results; I keep up with old acquaintances and meet new ones; I keep sharp and up-to-date as a research scientist.</p>
<p>Here are some of the major themes from this year:</p>
<p>* ASCII-compliant shellcode:  My favorite paper of the conference is &#8220;English Shellcode&#8221; where the authors developed a tool that takes malicious software as input and converts it into REAL ENGLISH PHRASES (taken from Wikipedia and Project Gutenberg) that execute natively on 32-bit x86.  If you read no other paper this year, you simply must read this paper, it is wack incredulous.  There was another paper that uses only valid ASCII characters for shellcode on the ARM architecture.  These demonstrations are important because ASCII (and especially English ASCII) is likely to be passed through by network intrusion detection systems.  The favorite paper is here:<br />
http://www.cs.jhu.edu/~sam/ccs243-mason.pdf</p>
<p>* Cloud computing:  Few authors of cloud-related papers seemed to address the cloudiness of their work, instead (and disappointingly) discussing generic distributed computing principles under a cloud umbrella.  The best cloud talk I saw was Ian Foster, an invited speaker at the cloud security workshop, who described the transition from grid computing to cloud computing thus: grid was about federation, cloud is about infrastructure and hosting.  He pointed out that the grid folks did a good job of developing (e.g., medical research) applications and executing analyses, but that it is the advent of data distribution and sharing in the cloud that is a game-changer in cloud computing.</p>
<p>* Anonymous communication:  There were several talks analyzing the efficacy of anonymization networks (mix networks, remailers, Tor, onion routing).  My takeaway is that these techniques work very well for latency-insensitive traffic (such as email), only moderately well for latency-sensitive traffic (such as web browsing), and not very well yet for high-bandwidth traffic (such as VoIP).  My favorite work was a poster on &#8220;Preventing SSL Traffic Analysis with Realistic Cover Traffic&#8221; (Nabil Schear and Nikita Borisov) where the authors change the statistical profile of your encrypted traffic such that existing analyses (such as measuring keystroke latencies) are impossible.</p>
<p>* Off-client emulation:  Several speakers described a technique for client-server applications (such as game clients running on customers&#8217; home computers) that help to ensure the correctness, robustness, or speed of the client application.  It&#8217;s impractical to run a complete copy of the client on the server (because one server handles many clients) so the authors generally create minimalist versions of the client (for example, a game client that contains no rendering code) that are server-efficient.  In the game example, the client would send the user&#8217;s commands (&#8221;turn left, walk forward&#8221;) to the server, where the minimalist client would verify that those commands didn&#8217;t result in an invalid state (such as walking through a wall) that would indicate cheating by the player.</p>
<p>* Function-call graphs:  These are well-known techniques for tracing how an application executes (create a graph of the control flow of an application).  The technique kept popping up during the conference: using them to identify when someone has violated your software license and included your source code in their application; using them inside a hypervisor to identify when a kernel rootkit is present in a virtual machine due to the different hypercalls).  One attendee I had lunch with was very critical of the function-call graph technique (using an argument I didn&#8217;t really follow) but otherwise the technique seems useful.</p>
<p>* Power grids:  The currently-hot topic in security research is power grids and smart meters.  There are at least projects at Penn State, Carnegie Mellon, Johns Hopkins, and I&#8217;m certain many other places.  There was a tutorial, a paper, and several posters all discussing security issues in the power grid.  The most interesting aspect to me was attacks against state estimators: the researchers described techniques to manipulate the system components involved in measuring and predicting the state of generators, transmission lines, etc.  However, the research community still suffers from a dearth of real-world information of how these networks operate and where the real vulnerabilities might be.</p>
<p>* RFID:  As we already know, it is possible to do RFID well but none of the actual deployed RFID implementations do it well.  One classic observation by a speaker was of the RFID-enabled drivers licenses issued in Washington State (in advance of the Winter Olympics) that include a KILL command that&#8217;s supposed to be set with a unique PIN but in reality is unset (using a default PIN)&#8230;meaning that anyone with a transmitter and sufficient power could kill a device.</p>
<p>* Ethical standards for security researchers:  One paper raised an ethical issue in its appendix (how can we do security research inside Amazon&#8217;s cloud computing infrastructure in a manner that doesn&#8217;t violate their terms of service?) and some researchers from the Stevens Institute have published a report and are organizing a workshop to investigate ethical standards for security researchers.  I didn&#8217;t really agree with many of the points made (my ethical line is drawn much further to the left: security researchers should have few constraints) but it was a hotly discussed and debated issue during the session breaks.</p>
<p>Wolfram Schulte at Microsoft Research gave an invited workshop talk on their Singularity OS project (reinventing the OS from scratch; using software-enforced isolation instead of relying on hardware memory management techniques).  It&#8217;s an interesting project but impractical since it would require a widescale by developers in such a way that very little development would happen for awhile.  The work was inspired by his team&#8217;s frustration on using best-practices formal verification (etc.) techniques for software development &#8212; or, taken another way, it was so frustrating when a blue-sky team tried to use existing techniques to develop and prove major software projects that they gave up.  That doesn&#8217;t bode well for using those techniques extensively in any real-world software development project (although they can still be very useful and insightful&#8230;just frustrating).</p>
<p>Also a shout-out to my student Brendan O&#8217;Connor for delivering a well-received talk on stock markets for reputation at the digital identity workshop.</p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2009/11/13/ccs-2009/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Information Assurance Conference</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2008/11/14/information-assurance-conference/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2008/11/14/information-assurance-conference/#comments</comments>
		<pubDate>Sat, 15 Nov 2008 00:53:05 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=14</guid>
		<description><![CDATA[In November 2008 I attended an &#8220;Information Assurance Conference&#8221; in Arlington, Virginia. This was a non-refereed two-day workshop of 30-minute talks on policy-level IA issues in the DoD and homeland security environments.  The most interesting takeaways were:

If you are an organization that wants information assurance, give someone the high-level independent power to veto (or [...]]]></description>
			<content:encoded><![CDATA[<p>In November 2008 I attended an &#8220;Information Assurance Conference&#8221; in Arlington, Virginia. This was a non-refereed two-day workshop of 30-minute talks on policy-level IA issues in the DoD and homeland security environments.  The most interesting takeaways were:</p>
<ul>
<li><strong>If you are an organization that wants information assurance, give someone the high-level independent power to veto (or vet) which applications are allowed to use the network.</strong></li>
</ul>
<p style="padding-left: 30px;">The U.S. Marine Corps has <a href="http://www.marines.mil/news/messages/Pages/2006/ESTABLISHMENT%20OF%20THE%20MARINE%20CORPS%20INFORMATION%20ASSURANCE%20DIVISION.aspx">an outstanding example of this power used successfully</a>: &#8220;the HQMC IA division will be the single point of contact within the marine corps for IA program, policy matters and oversight&#8230;[Mr. Ray Letteer] has authority to approve or disapprove an application or system for connection to [all Marine Corps core networks].&#8221;  And, according to Ray (the speaker), the USMC really has given him the teeth to enforce his team&#8217;s IA policies.</p>
<p style="padding-left: 30px;">Such a position of course requires diplomacy and tact: Ray mentions that he carefully vets the classifications of potential vulnerabilities to make sure only applications with demonstrable and unmitigatable vulnerabilities are ultimately banned from the network; he describes his role as translating geek-speak to the senior officers to convey the need for the restrictions his team enforces.</p>
<p style="padding-left: 30px;">After a cursory look I feel that this USMC approach could serve as an best-practices reference model for many other large organizations.  Another speaker noted that the traditional corporate and DoD approach is to have local administration (each division-sized entity has its own IA unit as part of its IT function), whereas the military is moving toward a single unifying enforcement point staffed by well-trained operators.  (I asked &#8220;isn&#8217;t homogeneity terrifying?&#8221;; other speakers responded that homogeneity doesn&#8217;t have to mean single-point-of-failure &#8212; they are not talking about one point of deployment, they are talking about unified policy across all points of deployment.)</p>
<ul>
<li><strong>If you need an ROI (return on investment) story to sell an IA strategy to your management, you&#8217;re in luck.</strong></li>
</ul>
<p style="padding-left: 30px;">Three speakers emphasized the availability of ROI metrics.  Joe Jarzombek described the <a href="https://buildsecurityin.us-cert.gov/swa/">free software assurance tools</a> that are available from the Department of Homeland Security.  As part of that effort DHS published seven articles on making a business case for software assurance (sample title &#8220;A Common Sense Way to Make the Business Case for Software Assurance&#8221;; click on the &#8220;Business Case&#8221; link at the above site) and recently held a workshop on the topic.</p>
<p style="padding-left: 30px;">Two other speakers suggested taking a nonstandard approach in selling security investments to your upper management: instead of justifying your existence, focus on demonstrating your continued competence.  For example, present graphical weekly metrics of how many port scans you thwarted or how many new security vulnerabilities were announced by antivirus companies that you prevented from affecting your network.</p>
<p style="padding-left: 30px;">Or, pick some of the low-hanging fruit to impress the bosses: Dr. Eric Cole of Lockheed-Martin mentioned a client engagement where his team was asked to suggest architectural changes to a network that was operating at 99% utilization.  After looking at the network traffic, his team simply blocked 74% of the outgoing connections (i.e., those connections which could not be traced to a business purpose).  Nobody complained, and the utilization was reduced to 55% at no cost to the customer.</p>
<ul>
<li><strong>If you are not a member of senior management, you need to learn to speak the language of senior management.</strong></li>
</ul>
<p style="padding-left: 30px;">This theme came up over and over during the workshop.  &#8220;Speak the language of executives &#8212; translate your geek-speak into business objectives!&#8221;  All I can say is: I agree.</p>
<p>Four other quick notes from the workshop:</p>
<p style="padding-left: 30px;"><strong>Whitelisting:</strong> One speaker mentioned a trend toward whitelisting web sites as a means of IA in military computer networks.  (Whitelisting is enumerating the list of acceptable sites and denying access to any other sites.)  I hadn&#8217;t heard that before &#8212; can anyone confirm you&#8217;re seeing this?</p>
<p style="padding-left: 30px;"><strong>COTS:</strong> Is COTS still on the rise?  Some speakers and attendees noted a trend toward COTS software and hardware, chiefly for the purchase costs and especially the (comparatively low) maintenance costs.  Others noted that there remain many applications, especially in classified domains, where commercial vendors are unwilling to tweak their product to fit the needs of the space, and/or there is too much inertia or turf-war to switch away from specialized development systems.</p>
<p style="padding-left: 30px;"><strong>Metadata:</strong> I was delighted to see a talk about metadata by Carol Farrant, whose team is interested in collecting, analyzing, and using metadata in data management for the intelligence and military communities. Of the technologies I heard discussed during the workshop, this is the one whose core technologies are arguably the least developed in the research and commercial environments. Unfortunately her team is underfunded and understaffed, so she is actively seeking volunteers to help move things along.  (She notes that in the past year she&#8217;s seen more volunteer interest on the topic than on anything else in her career.)  This might be an opportunity for an academic to have a big influence on metadata use and tool development.</p>
<p style="padding-left: 30px;"><strong>FPGAs:</strong> I&#8217;ve been a fan of programmable logic since working with FPGAs in Dr. Richard Chapman&#8217;s research lab at Auburn. The final speaker of the workshop, Jonathan Ellis, claimed that the moment is at hand for reconfigurable logic to be used the way it was always intended &#8212; specifically, actually reprogramming the chips (frequently) during normal operations. Vendors are currently working to make this possible (if I heard correctly: although the chips can support multiple independent execution units on them, they currently have to be completely wiped to be reprogrammed. Not for long.) FPGAs have come a long way in 10 years: he asserts that software toolkits for ease of programming and implementation &#8212; arguably the biggest barrier to their widespread use &#8212; are right around the corner.  He also noted that the current thinking is if you are building 100,000 or fewer units of something like cell phones, it&#8217;s more cost-effective and time-efficient to pump out FPGAs (instantly available and upgradable) than to send off for ASIC fabrication (expensive, two month lead time).</p>
<p>I thank the hosts of this event, Technology Training Corporation, for sending me a complementary pass to attend the workshop. (This workshop was similar to the &#8220;cyber security conference&#8221; I attended in June.) Overall I would likely not attend this workshop again, as I (as a practitioner of basic and advanced research) am not really in their target audience.  People who I think would be interested are people involved in policy-level marketing and sales for large government contractors, Marc Krull, and government employees involved with large program development and management.</p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2008/11/14/information-assurance-conference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>NSRC industry day</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2008/10/09/nsrc-industry-day/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2008/10/09/nsrc-industry-day/#comments</comments>
		<pubDate>Fri, 10 Oct 2008 02:12:33 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=13</guid>
		<description><![CDATA[This week I attended the 5th annual industry day at the Networking and Security Research Center (NSRC) at Penn State University.  The event was similar in format to other industry days I&#8217;ve attended (CMU, Stony Brook) but with a more focused core of industry guests, primarily from telecom companies and large government contractors.
My main [...]]]></description>
			<content:encoded><![CDATA[<p>This week I attended the 5th annual industry day at the Networking and Security Research Center (NSRC) at Penn State University.  The event was similar in format to other industry days I&#8217;ve attended (CMU, Stony Brook) but with a more focused core of industry guests, primarily from telecom companies and large government contractors.</p>
<p>My main interest was in the work of professors Trent Jaeger and Patrick McDaniel of the Systems and Internet Infrastructure Security (SIIS) laboratory.  Their students are working on several projects of interest to Jagged, including:</p>
<ul>
<li><a href="http://nsrc.cse.psu.edu/slides/id08/NSRC_ID08_poster_kevin.pdf">Self-protecting storage devices</a> (a fresh take on our work on active threat detection and response in block-based storage)</li>
</ul>
<ul>
<li><a href="http://nsrc.cse.psu.edu/slides/id08/NSRC_ID08_poster_josh.pdf">Measuring system integrity using VM roots-of-trust</a></li>
</ul>
<p>Another NSRC focus is on wireless networking research (cellular, sensor, 802.11, vehicular, you name it).  An upside of their work is that it is strongly focused on real-world problems reported by companies &#8212; for example, <a href="http://nsrc.cse.psu.edu/slides/id08/NSRC_ID08_poster_michaellin_sprint.pdf">CDMA2000-WiMAX internetworking</a>.  A related downside is that it wasn&#8217;t clear what academic (basic research) lessons could be drawn from some of the work; some of the results felt limited in scope and applicability to only a specific problem.</p>
<p>All the posters from the industry day are available here: <a href="http://nsrc.cse.psu.edu/id08.html"><br />
http://nsrc.cse.psu.edu/id08.html</a></p>
<p>The most interesting and controversial talk at the event was a keynote by Mr. Steven Chabinsky, the deputy director of the Joint Interagency Cyber Task Force.  He advanced the idea that we as a nation have let ourselves be &#8220;seduced&#8221; by technology, by plowing ahead with deployments of untested and unreliable technology at critical infrastructure points without first fully understanding (or mitigating) the risks and consequences of failure.  He called on us as researchers and companies to consider the full spectrum of threat, vulnerability, and consequence in our technological innovations.  A lively discussion ensued after the talk regarding the economic incentives to deploy unreliable technology: several of the topics were:</p>
<ul>
<li><em>Will better policy decisions be made when cyber risks are better understood?</em> The speaker described a current lack of capabilities to quantify risk either as an absolute or a comparative measurement.  This is especially true in low-risk but extremely-high-damage scenarios such as directed attacks against components of the power grid.  I felt this observation makes an excellent point, and highlights a mental gap between the way that engineers think of technology and the way that decisionmakers compare among technologies.  Perhaps the government should fund some new studies along these lines?</li>
</ul>
<ul>
<li><em>Where should the government draw the line between regulation and deregulation?</em> There are several non-regulatory actions the government could take to constructively assist companies in developing hardened products (say, that control water processing plants), such as making supplemental development grants available to companies whose technology will be used in critical infrastructure. On one hand, I feel that government should more actively oversee and regulate (and pay for) these kinds of technologies. But perhaps the problem is more complex than I realize &#8212; e.g., perhaps one gets a qualitatively better product through open-market competition than one would through contract specification and regulatory compliance. Anyone have an opinion on this?</li>
</ul>
<p>Mr. Chabinsky&#8217;s point was underscored later in the day in a talk on the <a href="http://siis.cse.psu.edu/everest.html">Ohio EVEREST voting study</a>.  Patrick McDaniel discussed how the Help America Vote Act effectively caused an insufficiently-tested prototype technology (electronic voting machines) for a low-profit-margin customer (the government) to be thrust into mandatory and widespread use in a critical environment (the legitimacy of our democracy) in only a few years. He concluded (<a href="http://www.bravenewballot.org/">as concluded by Avi Rubin</a> and others) that current systems are fundamentally flawed and unsecurable. In light of the above discussion, these fundamental flaws represent a failure of technologists (as well as many others) &#8212; both (a) in our inability to architect reliable systems and (b) in our inability to adequately inform public policy officials of the true readiness of proposed technologies.</p>
<p>This latter problem &#8212; coherently describing and conveying the capabilities and limitations of computer systems in a non-expert human-comprehensible manner &#8212; is one of the topics that has long interested me, especially in the context of information sharing in sensitive or classified environments.  Anyone want to join us in working on this problem?</p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2008/10/09/nsrc-industry-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>High end computing workshop</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2008/08/26/high-end-computing-workshop/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2008/08/26/high-end-computing-workshop/#comments</comments>
		<pubDate>Tue, 26 Aug 2008 19:44:08 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=11</guid>
		<description><![CDATA[In August 2008 I attended the HEC FSIO workshop on file system and I/O (FSIO) research in support of high-end computing (HEC).
This HEC focus was interesting for a systems guy like me &#8212;  think &#8220;systems that run detailed atmospheric simulations for weather prediction&#8221; and like environments where such words as &#8220;parallel&#8221;, &#8220;(peta)scale&#8221;, and &#8220;throughput&#8221; [...]]]></description>
			<content:encoded><![CDATA[<p>In August 2008 I attended the HEC FSIO workshop on file system and I/O (FSIO) research in support of high-end computing (HEC).</p>
<p>This HEC focus was interesting for a systems guy like me &#8212;  think &#8220;systems that run detailed atmospheric simulations for weather prediction&#8221; and like environments where such words as &#8220;parallel&#8221;, &#8220;(peta)scale&#8221;, and &#8220;throughput&#8221; are bandied about.  (Sample presentation title: <em>Improving scalability in parallel file systems for high end computing.</em>)</p>
<p>The primary attendees and presenters were academic PIs funded under a joint NSF/DOE program called HECURA.  This program chooses a new theme each year for its solicitations: last year&#8217;s was compilers; this fall&#8217;s will be FSIO (as it was three years ago).  All presentations from this workshop are available here:</p>
<ul>
<li><a href="http://institute.lanl.gov/hec-fsio/workshops/2008/">http://institute.lanl.gov/hec-fsio/workshops/2008/</a></li>
</ul>
<p>The work was all interesting but old; most of the work had been presented and discussed at the great conferences of yore.  What I ended up enjoying the most from this workshop was an &#8220;Industry Storage Device Research Panel&#8221; with two fabulous presentations:</p>
<ul>
<li><a href="http://institute.lanl.gov/hec-fsio/workshops/2008/presentations/day3/Aune-Aug2008-HDD-Trends-V4.pdf">Storage Device Trends</a> by Dave Aune (Seagate)<a href="http://institute.lanl.gov/hec-fsio/workshops/2008/presentations/day3/Wilcke-PanelTalkFlashSCM_fD.pdf"></a></li>
</ul>
<ul>
<li><a href="http://institute.lanl.gov/hec-fsio/workshops/2008/presentations/day3/Wilcke-PanelTalkFlashSCM_fD.pdf">Flash and Storage Class Memories: Technology Overview &amp; Systems Impact</a> by Winfried Wilcke (IBM)</li>
</ul>
<p>The above two talks are a great introduction to, respectively, the future of magnetic storage &amp; the future of alternatives to magnetic storage.</p>
<p>The most interesting thing I learned is DOE&#8217;s archival storage model.  If you want to archive something, you FTP PUT it onto an enormous server containing everything else that&#8217;s been archived in the last 60 years.  If you want to retrieve it, you FTP GET it.  (I didn&#8217;t learn how you locate the item you want, but there must be a standard naming scheme or an index &#8212; if you know please send me a note.) I chatted briefly with Mark Gary, data storage group leader at LLNL, about the differences between that model and all the digital preservation issues we touched upon in the <a href="http://hssl.cs.jhu.edu/~randal/600.409/Syllabus.html">class I co-taught this Spring</a> (metadata generation, textual normalization, ontology standardization, language translation, QoS, security, access methods, historical ingest, etc.)  Mark made the point that their KISS approach, while limited in functionality at first glance, both works well and continues to do exactly what their users need.</p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2008/08/26/high-end-computing-workshop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SIEVE: Software Insertion and Execution in Virtual Environments</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/11/sieve-software-insertion-and-execution-in-virtual-environments/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/11/sieve-software-insertion-and-execution-in-virtual-environments/#comments</comments>
		<pubDate>Fri, 11 Jul 2008 19:06:42 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[proposals]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=9</guid>
		<description><![CDATA[This research proposal by Jagged Technology was initially targeted to SBIR program OSD08-IA2. We continue to seek sponsorship for the proposed work. If you are interested in teaming with us on future proposals on related computer systems topics, please contact us.
1. Introduction
Computer system virtualization, long a staple of large-scale mainframe computing platforms, has exploded into [...]]]></description>
			<content:encoded><![CDATA[<p><span style="color: #ff0000;"><em>This research proposal by Jagged Technology was initially targeted to SBIR program <a href="http://www.dodsbir.net/sitis/archives_display_topic.asp?Bookmark=31864">OSD08-IA2</a>. We continue to seek sponsorship </em></span><span style="color: #ff0000;"><em>for the proposed work. If you are interested in teaming with us on future proposals on related computer systems topics, please <a href="http://jaggedtechnology.com/contact.html">contact us</a>.</em></span></p>
<h4>1. Introduction</h4>
<p>Computer system virtualization, long a staple of large-scale mainframe computing platforms, has exploded into the commodity and off-the-shelf markets due to the widespread and affordable availability of software and hardware products that enable virtualization. As with any major change, the implications of the widespread deployment of virtualization are yet ill-understood; work is urgently needed to quantify the new opportunities and risks posed by specific virtualization technologies.</p>
<p>In this “wild west” phase of the commodity use of virtualization technology, each of the three aspects of information security (availability, confidentiality, and integrity) are by necessity being revisited by research labs around the world. For example, here is a representative sampling of three ways information security changes under virtualization:</p>
<ul>
<li>Availability. Virtualization enables the straightforward deployment and management of redundant and widely-distributed software components, increasing the availability of a critical computer service. (The PI has prior experience in this space.)</li>
</ul>
<ul>
<li>Confidentiality. The use of hypervisors (a.k.a. virtual machine monitors) adds a new software protection domain to hardware platforms. This enables software workloads to run in isolation even when time-sharing the hardware with other workloads. (The PI has prior experience in this space.)</li>
</ul>
<ul>
<li>Integrity. Any software running within this new protection domain has privileged (superuser) access to system memory and hardware resources. This access enables the creation of new software auditing tools that unobtrusively monitor software workloads in a virtual machine.</li>
</ul>
<p>It is this latter example of software in a privileged protection domain that is of interest in this proposal. The desired outcome of this research is a quantification of an opportunity (injecting software into a virtual machine) and a risk (the detectability of such injection). Specifically, Jagged Technology proposes:</p>
<ul>
<li>In Phase I, we will develop a software tool that runs inside a privileged protection domain. [Footnote: This could be from a privileged VM (in what are known as Type I hypervisors), from a host OS (in Type II VMMs), or from the hypervisor itself. In the approach described later in this proposal, our tool runs in a privileged VM.] This tool will manipulate data structures in the memory of an ordinary protection domain (a virtual machine), in order to covertly load and run general-purpose software within the virtual machine.</li>
</ul>
<ul>
<li>In Phase II, we will explore ways to detect the covert general-purpose software or to thwart the operation of our stealth-insertion tool. For example, this may include obfuscating the method by which the operating system maintains its queue of active and inactive processes, or accessing memory pages in a statistically random fashion to analyze the timing of page faults to those pages.</li>
</ul>
<p>By covert, we mean primarily that an adversary is unable to read the contents of the memory locations containing the instructions or data used by our covert software, and secondarily that evidence in audit and system logs is minimized or removed to reduce the odds that the covert software will be detected. We assume that this adversary is able to run software with full administrative privileges (root access) inside one or more non-privileged virtual machines on the hardware platform. It may also be possible to use our approach to monitor one privileged VM from another VM, addressing the case where an adversary has administrative privileges inside the trusted computing base. However, we focus only on non-privileged virtual machines—the location where most military and commercial applications will actually be run—in this proposal.</p>
<h4>2. Phase I objectives</h4>
<p>We identify four base objectives for our Phase I work. The first pair of objectives involve the covert loading of software in a virtual environment:</p>
<ul>
<li>Objective #1: Use software running in one virtual machine (VM–1) to insert executable code, including instructions and data, into another virtual machine (VM–2).</li>
</ul>
<ul>
<li>Objective #2: Prevent software in VM–2 from being able to read or otherwise obtain the instructions and data of the inserted code.</li>
</ul>
<p>The second pair of objectives involve the covert execution of software in a virtual environment:</p>
<ul>
<li>Objective #3: Use software running in VM–1 to cause the operating system in VM–2 to actually execute the inserted code.</li>
</ul>
<ul>
<li>Objective #4: Prevent software in VM–2 from using the operating system’s audit facilities to trace or track the execution of the inserted code.</li>
</ul>
<h4>3. Phase I work plan</h4>
<p>We propose work to build a software tool that causes a rogue process to be covertly loaded into a Xen virtual machine running the Linux operating system, and further causes that rogue process to be covertly executed. This work addresses the four Phase I technical objectives and is organized into three tasks:</p>
<ul>
<li>Task I: Architecture and documentation. Develop prototype algorithms for modifying the Linux OS structures and Xen page table entries to support covert insertion and execution of software into a virtual machine.</li>
</ul>
<ul>
<li>Task II: Covert insertion (objectives #1 and #2). Create a minimal software prototype to insert code into a VM from the Domain-0 VM. [Footnote: Domain-0 is a privileged virtual machine containing the Xen command-line system management tools and also the device drivers that coordinate with the real system hardware.] Demonstrate how the memory pages used by the code are protected against unauthorized access from other software components inside the VM.</li>
</ul>
<ul>
<li>Task III: Covert execution (objectives #3 and #4). Extend the minimal software prototype to externally cause the OS inside the VM to schedule and execute the code. Demonstrate how the OS audit facilities are bypassed or modified to hide the execution of the code.</li>
</ul>
<p>This figure demonstrates a high-level view of the technique we will use to insert software into a virtual machine:</p>
<p><a href="http://jaggedtechnology.com/people/john.griffin/blog/wp-content/uploads/2008/07/proposal-graphic.png"><img class="aligncenter size-full wp-image-10" title="Overview of covert insertion and execution" src="http://jaggedtechnology.com/people/john.griffin/blog/wp-content/uploads/2008/07/proposal-graphic.png" alt="" width="500" height="340" /></a></p>
<h4>3.1 Developmental and experimental platform</h4>
<p>We plan to execute this project using the Linux operating system. We will covertly insert processes into a running instance of Linux and prevent other processes (or components of the Linux kernel) from accessing the memory of our covertly inserted processes. To create our virtual environment we will use the opensource Xen hypervisor (http://xen.org) deployed on systems with processors based on the Intel x86 architecture.</p>
<p>We note that nothing about our approach is specific to the use of Linux or Xen. We envision future work to extend the technology developed under this program into a multi-purpose tool that works with other operating system and hypervisor combinations.</p>
<p>Our approach does not depend on whether the processor supports hardware-assisted virtualization. Our tool will be designed to work on platforms that do or do not contain processors with Intel Virtualization Technology or with AMD Virtualization.</p>
<h4>3.2 Task I: Architecture and documentation</h4>
<p>We will create an architecture document describing the method by which our tool works. This document will be intended to provide the government with knowledge of this technique that lasts beyond the length of this contract. The document will describe the functional design of our prototype tool and provide software and hardware specifications for the execution environment in which our tool runs. As our prototype will be minimal due to the limited scope of the Phase I effort, this document will describe the tasks that will be necessary to refine and extend the prototype for commercial viability and to meet the Phase II objectives.</p>
<p>As part of this task we will create a list of candidate countermeasures to our tool’s operation, based on our insider knowledge from the process of creating the tool. This list will highlight the most promising of these countermeasures that could form a core component of the Phase II effort.</p>
<h4>3.3 Task II: Covert insertion</h4>
<p>By drawing on the introspection principles demonstrated by the Livewire prototype, the XenAccess monitoring library and other security projects designed to allow one VM to monitor another VM, we will create a well-documented minimal software prototype for covert insertion of a rogue process.</p>
<p>This prototype will use the Xen grant table page-management interface to map and modify the contents of the VM from within Domain-0. The prototype will analyze the Linux data structures inside the VM to create a rogue process and to determine which pages are currently available for assignment to the rogue process.</p>
<p>In our approach we will subtly modify the page table entries for the VM—specifically, changing entries to remove the “page present” bit—to cause Xen to trap page faults into our own page-fault handler. This approach is along the lines of the method used by the Shadow Walker rootkit to hide the existence of modified pages from the operating system.</p>
<p>As discussed in section 1, our primary definition of covert is that that an adversary is unable to read the contents of the memory locations containing the instructions or data used by our covert software. Our approach fulfills this definition of covert: pages containing our data are accessible inside the virtual machine, but we monitor and gate which processes are allowed to read those pages by trapping of the virtual machine’s page faults. If anyone other than our rogue process attempts to read the pages, we could overwrite the page contents before completing the page fault. We could even use this technique to have our rogue process share pages with the pages used by legitimate processes, further obscuring the insertion of the rogue memory contents.</p>
<p>We expect that some modifications we will need to make to the OS data structures would ordinarily require full-kernel locking for safety. Locking can be a noticeable event, so we will attempt to create a tool that leverages other events in the virtual environment (such as domain scheduling) to obviate the need for explicit locking. Our prototype will first hook the Xen VM scheduler and domain-management infrastructure to passively determine at what times it is safe to make modifications to the domain’s memory (i.e., when the domain is blocked or otherwise not scheduled). We will then investigate an algorithm for making on-the-fly modifications to the VM without needing to first quiesce the VM, by monitoring which process is active inside the VM and determining which pages are in that process’ working set. If the process changes during our modifications, our prototype will have the option of immediately pausing the domain to complete its work while avoiding overt detection. [Footnote: A more detailed option, to be explored in Phase II, is monitoring which processes are currently running on which processors, and only to pause the VM when it unexpectedly enters kernel mode or otherwise performs potentially unsafe actions.]</p>
<h4>3.4 Task III: Covert execution</h4>
<p>By drawing on the principles demonstrated by the Xenprobes library, the Kprobes framework, and other security projects designed to allow an administrator to hijack the scheduler, debug the operating system, or probe the execution of a Linux virtual machine, we will create a well-documented minimal software prototype for covert execution of a rogue process. This could involve either inserting a new process onto the scheduler’s queue, or if possible hijacking existing processes for brief amounts of time to cause them to execute work on our behalf.</p>
<p>The focus of this task is both to cause the process to be executed and to do what is possible to cover the tracks of that execution by limiting the amount of resources consumed by our rogue process and by modifying the audit logs kept by the operating system to erase any entries or values indicating that our process executed. This satisfies our secondary definition of covert, in that evidence in audit and system logs is minimized or removed to reduce the odds that the covert software will be detected. As we develop the software for this task, we will explore the limit of stealth execution—identifying what evidence is visible during the actual execution of a process (for example, system status entries in the /proc file system) and determining whether this real-time evidence can be spoofed or removed.</p>
<p>Quantifying the many ways that a process leaves evidence of its execution has benefits both in Phase I and in Phase II, where one of the tasks will be to develop countermeasures to a tool like ours based on information available to an adversary inside the virtual machine. There are many types of evidence that may be observable by an adversary. For example, one side effect of mapping one virtual machine’s memory page in another VM is that it could change the contents of the memory caches (in essence, VM–1 could potentially preload the L2 cache with data from pages that VM–2 will soon access), which would change the timing characteristics of operations that the second VM would perform. Our continued identification of these issues will be incorporated into the architecture document from Task I.</p>
<h4>3.5 Discussion and alternate approaches</h4>
<p>Our approach relies on the existence of an inviolate trusted computing base (TCB) on the target system—in other words, there is a privileged part of the overall virtualized system to which the adversary cannot ordinarily gain access. For example, operating systems inside a Xen guest domain (virtual machine) cannot ordinarily access memory owned by other virtual machines unless they are specifically authorized to do so by both the administrator of the other domain and by the hypervisor itself.</p>
<p>Requiring a TCB does not mean that we require any sort of Trusted Computing hardware or specialized software modules in the hypervisor. Rather, it emphasizes that our focus is on using virtualization to aid our task of hiding the existence of software from the most common-case vantage point where an adversary will have access. Once we have developed the capability to externally insert a process into a running kernel, our tool could be deployed in more elaborate environments such as on specialized external hardware.</p>
<p>There are a panoply of alternate approaches for the covert insertion of software. Two examples include:</p>
<ul>
<li>An interesting aspect of certain virtualization (and virtualization detection) hacks, as well as a number of exploits, is their reliance on unmapped (or differently mapped) access to physical memory from devices. With more and more powerful devices on the bus, many of them sporting their own CPUs, an alternate type of virtualization attack involves the embedding of “undetectable” code in the firmware of these devices. This is particularly applicable to network cards with protocol offload functionality, since these would have network access and be able to snoop traffic or play man-in-the-middle, in addition to looking at host memory. Of course, with hardware access, it would always be possible read the firmware off the peripheral’s ROM. However, assuming that there is no practical way to scan the firmware from the host CPU, and assuming that reflashing the ROMrequires cooperation from the device CPU, it could be possible for the code incorporate itself into new firmware images loaded from the host. This would make it not only covert but difficult or impossible to eradicate without physical disassembly of the board.</li>
</ul>
<ul>
<li>Another approach involves encrypting the software and executing it on a protected hardware decryption offload device (or on a specially-modified CPU). This would allow for simple storage and loading of the software while defending against its reverse-engineering by the adversary. This does raise the question of where to store or how to securely load the decryption key, especially in a large distributed environment. Assuming that Trusted Computing hardware is available on the system, one potential solution would be to have a non-encrypted seed-loader-hypervisor that verifies its own integrity—and its execution outside of a virtual environment—that obtains the decryption key from the network using peer-to-peer or other protocols.</li>
</ul>
<p>Both of the above approaches rely on the availability of special-purpose hardware to perform independent execution, decryption, or attestation. Requiring special-purpose hardware is technically practical, thanks to products such as the IBM 4768 secure cryptographic coprocessor card, but we anticipate that the cost-efficiency of the widespread deployment and management of the hardware will keep hardware-based solutions impractical for years to come. Our solution does not require specialized hardware and thus, at the expense of not protecting against an attacker with physical hardware access, is immediately practical.</p>
<p>A third example combines our approach with well-known recent work:</p>
<ul>
<li>Inserting a virtualization layer between the existing virtualization layer and the hardware—the technique demonstrated by the recent Blue Pill and SubVirt work on multi-layer virtual machine monitors—would allow our covert software insertion tool to run at a level more privileged than all other software on the system. This would enable us to use the techniques developed in Task II to insert protection software anywhere, including in the VMM, a privileged VM, or an ordinary VM. To be effective such an approach would likely need to be predeployed on all systems to be monitored (i.e., every system either currently runs multiple layers of VMMs or is able to be rebooted to install another layer on demand), and would also likely employ techniques that attempt to hide the fact that the extra virtualization layer exists.</li>
</ul>
<p>All currently-demonstrated technologies for running multiple layers of virtualization on an x86 system are highly specialized to the specific hypervisor technology deployed on the system. As these technologies are seemingly in a state of constant flux, we choose to focus this proposal on developing the techniques for the external insertion and scheduling of rogue software; once developed, these techniques may be placed and run anywhere, including in such an extra virtualization layer.</p>
<p>We note that our team has expertise in all three of these alternate areas. The technologies we develop under this program would certainly have relevance in any future research and development work that explores broader forms of software insertion and countermeasures to insertion.</p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/11/sieve-software-insertion-and-execution-in-virtual-environments/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Cyber security conference</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/09/cyber-security-conference/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/09/cyber-security-conference/#comments</comments>
		<pubDate>Thu, 10 Jul 2008 03:48:47 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=5</guid>
		<description><![CDATA[In June 2008 I attended a &#8220;Cyber Security Conference&#8221; in Arlington, Virginia.  The format was two days of invited 35-minute presentations by big names in the government and government-contractor space.  I only attended day two so I missed half the discussion.  Here are some of the major themes from today&#8217;s twelve speakers:

Targeted phishing (a.k.a. &#8220;spear [...]]]></description>
			<content:encoded><![CDATA[<p>In June 2008 I attended a &#8220;Cyber Security Conference&#8221; in Arlington, Virginia.  The format was two days of invited 35-minute presentations by big names in the government and government-contractor space.  I only attended day two so I missed half the discussion.  Here are some of the major themes from today&#8217;s twelve speakers:</p>
<ul>
<li>Targeted phishing (a.k.a. &#8220;spear phishing&#8221; or &#8220;whaling&#8221;&#8212;can we as a community agree to stop coming up with terrible nouns like these?) was mentioned more often by more people than any other cyber security problem.  Targeted phishing is a social engineering attack where someone learns enough about you (or your work environment) to send you a custom-made email.  One example involved a newly-promoted CFO, where the evildoers read about the CFO&#8217;s promotion in a newspaper and wrote a letter from &#8220;HR&#8221; asking (successfully) for personal information, passwords, etc., in order to set up the new executive&#8217;s computer account.  Four of the speakers mentioned phishing as one of the top problems they are facing on corporate and government networks&#8230;</li>
</ul>
<ul>
<li>&#8230;which reminds me how two speakers complained that spending/effort on cyber security is not well-balanced among the actual risks.  Joshua Corman of IBM phrased it nicely by pointing out that cyber attacks merely for the sake of attacking (&#8221;prestige&#8221; attacks) ended in 2004; attacks since then appear to have been driven either by financial (&#8221;profit&#8221;) or, more recently, activist (&#8221;political&#8221;) motives.  The problem is that the bulk of cyber security efforts/dollars are going to thwart attackers that are easy to identify (worms, spam) leaving us exposed to more discreet attackers.  (Of course, nobody had a ready solution for how to identify and thwart these discreet attackers&#8212;a discrete problem.)</li>
</ul>
<ul>
<li>However, two speakers independently mentioned anomaly detection as an it-continues-to-be-promising approach to cyber security, while acknowledging that the false positive problem continues to plague real-world systems.  One of the core problems I&#8217;d like to see studied involves the characterization of real-world network traffic (especially in military environments).  Specifically, for how long after training does an anomaly detection model remain valid in an operational system: seconds? hours? weeks?</li>
</ul>
<p>Two talks I really enjoyed were from Boeing and Lockheed-Martin, in which a speaker from each talked about the organization and internal defense strategy (applied cyber security?) of his corporate network.  I appreciate when companies are willing to share these kinds of operational details to make reseachers&#8217; jobs easier: storage companies take note!  Unfortunately the talks were light on details but provided some interesting insight on email defense (#1: Outlook helpfully hides the domain name, aiding a phisher&#8217;s task, so write filters to block addresses like &#8220;jaggedtechno1ogy.com&#8221; at the corporate mail server; #2: many spams or phishing attacks come from newly-created domains, so write filters for this too&#8212;I&#8217;ve mentioned previously that we should perhaps tolerate some inconvenience for the sake of computer defense, and these are good examples of that).  Two questions I&#8217;d like someone to answer:</p>
<ol>
<li>How can we coax corporate network managers to be willing to evaluate active response systems (e.g., attack the attacker) on production networks?  It is probably much easier to do there (legally) than on government networks.</li>
<li>When will corporate networks deploy the security support services (admission control, identity verification, key management) that allow application programmers to focus on their core competencies instead of being security experts?  C&#8217;mon, folks, it&#8217;s 2008.</li>
</ol>
<p><strong>UPDATE:</strong></p>
<p>Three people have mentioned that question #1 is unlikely to have an answer:</p>
<blockquote><p>What are the corresponding real-world analogies?  When is it legal for me, personally, to respond to a physical threat?  Only when there is serious threat of harm to myself or someone else (or, in some states, my property). Otherwise, call the policy (or the military). I doubt cyber-society will act much different. But, this does beg the question of where are the cyberpolicy and cyberDoD!</p></blockquote>
<p>And everyone agrees that question #2 needs to happen, like, yesterday:</p>
<blockquote><p>I think that the best answer as to why it hasn&#8217;t happened is related to cost.  And, in this case, cost is directly related to usability for the sysadmins.  If they can do username / password and be done with it, then they will.  And they will only move to other measures if/when they are required to (e.g., corporate policy, liability concerns, etc).  However, if one could find a way to overlay this security goodness onto an existing network in a way that is no harder (and perhaps even easier) than username / passwords, then they might want to do it.  Esp if this overlay then allowed for a tangible benefit in terms of increased security of everything else.</p></blockquote>
<p>Thanks, Greg and Bryan.</p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/09/cyber-security-conference/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PDL visit day</title>
		<link>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/09/pdl-visit-day/</link>
		<comments>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/09/pdl-visit-day/#comments</comments>
		<pubDate>Thu, 10 Jul 2008 03:17:46 +0000</pubDate>
		<dc:creator>JLG</dc:creator>
				<category><![CDATA[reviews]]></category>

		<guid isPermaLink="false">http://jaggedtechnology.com/people/john.griffin/blog/?p=4</guid>
		<description><![CDATA[In May 2008 I attended the PDL Spring Industry Visit Day in Pittsburgh, a workshop of sorts where students display their work in poster and demo form, industry visitors catch up with their old storage acquaintances, and everybody gets together for German food and beer afterward.  (What&#8217;s not to like?)
Here are some of the [...]]]></description>
			<content:encoded><![CDATA[<p>In May 2008 I attended the PDL Spring Industry Visit Day in Pittsburgh, a workshop of sorts where students display their work in poster and demo form, industry visitors catch up with their old storage acquaintances, and everybody gets together for German food and beer afterward.  (What&#8217;s not to like?)</p>
<p>Here are some of the larger tidbits I took away from the event:</p>
<p>1. Filesystems statistics survey</p>
<p style="padding-left: 30px;">Garth Gibson organized a 5-year DoE institute, the Petascale Data Storage Institute, to explore issues of interest to folks like the national labs.  A nifty thing they&#8217;re doing is putting together public repositories of useful data for storage researchers.  For example, the Computer Failure Data Repository contains the data Garth and Bianca used for the MTTF FAST paper.</p>
<p style="padding-left: 30px;">So, the latest one is the &#8220;filesystem statistics survey.&#8221;  There is a tool that anyone can run and a respository for folks to upload their results.  The type of results that they&#8217;ve generated so far are:</p>
<ul>
<li>In archival file systems (at the national labs), most space is consumed by a small number of large files: 90% of space is consumed by files 32MB or greater in size, whereas 90% of files are smaller than 32MB.</li>
</ul>
<ul>
<li>In 75% of the archival file systems, 80%-90% of the files consume less than 2KB apiece.</li>
</ul>
<p style="padding-left: 30px;">This is available at:<br />
<a href="http://www.pdsi-scidac.org/fsstats/index.html">http://www.pdsi-scidac.org/fsstats/index.html</a></p>
<p>2. Hadoop</p>
<p style="padding-left: 30px;">I hadn&#8217;t heard about Hadoop before today (do I live under a rock? does everyone know what this is?)  Hadoop is an open-source implementation of MapReduce &#8212; i.e., a toolset to help a user easily fire off map() and reduce() functions on his or her own cluster of heterogeneous boxes.  An example from my favorite online encyclopedia: &#8220;The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4TB of raw image TIFF data (stored in S3) into 1.1 million finished PDFs in the space of 24 hours at a computation cost of just $240.&#8221;</p>
<p style="padding-left: 30px;">So I guess distributed computing is just getting easier and easier.  One of my colleagues was setting up a Condor cluster just as I was leaving CMU so I didn&#8217;t get to learn a lot about it or see it in action.  If you have experience with Condor or Hadoop I&#8217;d appreciate your giving me an overview sometime.</p>
<p style="padding-left: 30px;">My favorite Hadoop-related project was applying the &#8220;fingerpointing&#8221; techinque (from Priya Narasimhan and her students) to identify in real time which nodes are the source of performance slowdowns in a Hadoop-based system.  Fingerpointing is their take on failure detection and root-cause analysis in distributed systems, described here:<br />
<a href="http://www.ece.cmu.edu/~fingerpointing/">http://www.ece.cmu.edu/~fingerpointing/</a></p>
<p style="padding-left: 30px;">One of the topics I care about (related to #1) is using what auditable information has been collected about a system to actually do some useful auditing, which is why I&#8217;m interested in this particular work.</p>
<p>3. Home media storage</p>
<p style="padding-left: 30px;">My favorite of the projects is &#8220;Perspective&#8221;, described here:<br />
<a href="http://www.pdl.cmu.edu/HomeStorage/">http://www.pdl.cmu.edu/HomeStorage/</a></p>
<p style="padding-left: 30px;">They are looking at information stored in home media environments and asking questions about how real users want to interact with their storage: how easy is it to accomplish tasks such as &#8220;make sure a movie is on Randal&#8217;s ipod before he leaves for his upcoming trip&#8221; or &#8220;make sure this set of files in Zach&#8217;s JPEG archive can&#8217;t be viewed by anyone else in his household.&#8221;</p>
<p style="padding-left: 30px;">User studies in computer science is an underdeveloped field.  I got really interested in this after I saw some interesting work at IBM (the Sparcle project, linked below) that did a user study to see how well computer-literate people were able to specify access control policies. A lot of CS work suffers from a lack of user-centric design, so I&#8217;m happy to see any work that tries to address the problem.  Sparcle is here: <a href="http://domino.research.ibm.com/comm/research_projects.nsf/pages/sparcle.index.html">http://domino.research.ibm.com/comm/research_projects.nsf/pages/sparcle.index.html</a></p>
]]></content:encoded>
			<wfw:commentRss>http://jaggedtechnology.com/people/john.griffin/blog/2008/07/09/pdl-visit-day/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
