Monday, March 10, 2008

Hadoop - MapReduce

Welcome to Hadoop!

Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data.

Here's what makes Hadoop especially useful:

  • Scalable: Hadoop can reliably store and process petabytes.
  • Economical: It distributes the data and processing across clusters of commonly available computers. These clusters can number into the thousands of nodes.
  • Efficient: By distributing the data, Hadoop can process it in parallel on the nodes where the data is located. This makes it extremely rapid.
  • Reliable: Hadoop automatically maintains multiple copies of data and automatically redeploys computing tasks based on failures.

Hadoop implements MapReduce, using the Hadoop Distributed File System (HDFS) (see figure below.) MapReduce divides applications into many small blocks of work. HDFS creates multiple replicas of data blocks for reliability, placing them on compute nodes around the cluster. MapReduce can then process the data where it is located.

Hadoop has been demonstrated on clusters with 2000 nodes. The current design target is 10,000 node clusters.

For more information about Hadoop, please see the Hadoop wiki.

architecture

Friday, March 7, 2008

Saving The Planet At The Speed Of Light: Green Cyber Infrastructure

Saving The Planet At The Speed Of Light: Green Cyber Infrastructure

How to address the challenges of CI uptake

Mar 7, 2008 11:30 am to 12:30 pm


Bill St. Arnaud is Director of Network Projects for CANARIE Inc., an industry government consortium to promote and develop information highway technologies in Canada. At CANARIE, Bill St. Arnaud has been responsible for the coordination and implementation of Canada's next generation Internet initiative called CA*net II. More recently Bill St Arnaud is coordinating the CANARIE initiative to build the world's first national optical Internet network as announced in the February 1998 Budget.

Previously Bill St. Arnaud was the President and founder of a network and software engineering firm called TSA ProForma Inc. TSA was a LAN/WAN software company that developed wide area network client/server systems for use primarily in the financial and information business fields in the Far East and the United States. In 1989 TSA was sold to business interests in Hong Kong and Toronto.

Bill St. Arnaud is a frequent guest speaker at numerous conferences on the Internet and ATM networking and is a regular contributor to several networking magazines

Monday, March 3, 2008

Data-Intensive Super Computing: Taking Google-Style Computing Beyond Web Search

Data-Intensive Super Computing: Taking Google-Style Computing Beyond Web Search
March 10, 2008 - Coates Hall 155


Randal E. Bryant, Carnegie Mellon University
Dean, School Of Computer Science


Abstract

Web search engines have become fixtures in our society, but few people realize that they are actually publicly accessible supercomputing systems, where a single query can unleash the power of several hundred processors operating on a data set of over 200 terabytes. With Internet search, computing has risen to entirely new levels of scale, especially in terms of the sizes of the data sets involved. Google and its competitors have created a new class of large-scale computer systems, which we label "Data-Intensive Super Computer" (DISC) systems. DISC systems differ from conventional supercomputers in their focus is on data: they acquire and maintain continually changing data sets, in addition to performing large-scale computations over the data.

With the massive amounts of data arising from such diverse sources as telescope imagery, medical records, online transaction records, and web pages, DISC systems have the potential to achieve major advances in science, health care, business, and information access. DISC opens up many important research topics in system design, resource management, programming models, parallel algorithms, and applications. By engaging the academic research community in these issues, we can more systematically and in a more open forum explore fundamental aspects of a societally important style of computing.


Tom Bishop's Theoretical Molecular Biology Lab

Dr. Bishop's research interests are in the area of theoretical and computational molecular biology, with a particular emphasis in molecular modeling and molecular dynamics simulations of proteins and DNA. The current focus is developing a multiscale model of DNA, nucleosomes and chromatin.

Structure and Dynamics of DNA, Nucleosomes and Chromatin.

The structure and dynamics of DNA and chromatin are being modeled by a combination of molecular dynamics and mathematical modeling techniques. The goal of this research is to develop an understanding of how local events, such as DNA binding, affect the global structure and dynamics of DNA and chromatin. For this purpose, all atom molecular dynamics simulations are used to model DNA and protein-DNA interactions in solution. The results are analyzed to characterize the effects of the protein on the conformation and dynamics of the DNA, including DNA bending, twisting, and stretching. This information is then included in a model of DNA based on the theory of elastic rods that predicts how local distortions of DNA alter the structure and dynamics of DNA and chromatin on longer length scales. The elastic rod model being developed thus provides a rigorous mathematical basis for analyzing how protein-DNA interactions and DNA sequence specific properties orchestrate cellular processes such as gene regulation.

Methods:
This research requires classical mechanics, mathematical and numerical analysis, numerical integration, computer programming, molecular modeling and visualization techniques, as well as, an understanding of the structure and function of DNA and chromatin.

PetaShare All Hands Meeting

PetaShare All Hands Meeting

Monday, March 3rd, 8:00am – noon
338 Johnston Hall (CCT)

08:00 am : Opening Remarks
Ed Seidel, LSU CCT/Dept. of Astronomy

08:20 am : PetaShare – State of the Union (slides)
Tevfik Kosar, LSU CCT/Dept. of Computer Science

09:00 am : LBRN and PetaShare Shared Goals (slides)
Bill Wischusen, LSU Dept. of Biological Sciences

09:15 am : High Freq. Geostationary Satellite Data in Hurricane Surveillance (slides)
Nan Walker, LSU Dept. of Oceanography and Coastal Sciences

09:30 am : CFD Investigations of Pulmonary Airway Reopening (slides)
Don Gaver, Tulane CCS/Dept. of Biomedical Engineering


09:45 am : Break

10:00 am : High Throughput High Performance Molecular Dynamics Simulations (slides)
Thomas Bishop, Tulane CCS

10:15 am : Synchrotron X-ray Tomography of Flame Retardants in Polymers (slides)
Les Butler, LSU Dept. of Chemistry

10:30 am : Distributed Machine Learning for Bio- and Cheminformatics (slides)
Stephen Winters-Hilt, UNO Dept. of Computer Science

10:45 am : Challenges to Developing a Coastal Ecosystem Forecasting System (slides)
Robert Twilley, LSU Dept. Oceanography and Coastal Sciences


11:00 am : Status Reports from Site Leads (LSU, UNO, ULL, Tulane, and LaTech)
11:30 am : Wrap Up