Wednesday, July 16, 2008

RADIANT Lab - Research on TCP tuning

  • RADIANT Lab @LANL


http://public.lanl.gov/radiant/pubs.html


  • Optimizing GridFTP Through Dynamic Right-Sizing.
    S. Thulasidasan, W. Feng, and M. Gardner
    IEEE Symposium on High-Performance Distributed Computing. (HPDC-12/2003), Seattle WA, June 2003.
    [LA-UR 03-2486] (8 pages - postscript, 339K; PDF, 267K)

  • Routing and Scheduling Large File Transfers over Lambda Grids.
    A. Banerjee, W. Feng, D. Ghosal, and B. Mukherjee
    The 3rd International Workshop on Protocols for Fast Long-Distance Networks (PFLDnet'05), Lyon, France, February 2005.
    [LA-UR 05-7911] (5 pages - PDF, 576K)
  • Scheduling and Transport for File Transfers on High-Speed Optical Circuits.
    M. Veeraraghavan, X. Zheng, W. Feng, H. Lee, E. Chong, and H. Li
    Journal of Grid Computing, Vol. 1, No. 4, June 2004.
    [LA-UR 04-2008] (18 pages - PDF,195K)



Dynamic Right-Sizing (DRS) Software Distribution

This is the official distribution site for DRS.

Dynamic Right-Sizing provides automatic tuning of TCP flow control windows to support high bandwidth over high-latency (WAN) links. It improves TCP throughput by orders of magnitude over high delay-bandwidth links. It also keeps windows small for low-bandwidth and low-latency connections so they don't consume unnecessary amounts of memory.

There are two versions of DRS. The first is implemented in the Linux kernel and the second is implemented in user-space applications.

DRS Kernel-Space Implementation

The modifications to the Linux kernel are being distributed as a patch file under the GNU General Public License. If the GPL is too restrictive, please contact us.

Once you have the source for correct version of the Linux kernel, download and apply the appropriate DRS patch file for your kernel. (The patch may also be applied against a different version of the kernel, but you may have to patch some of the kernel files by hand.) For a synopsis of how to install the DRS kernel patch and rebuild you kernel, see the DRS Installation Instructions for help. We regret that we are unable to assist with DRS installation questions. However, we would like to know if you find (and fix) a bug or make enhancements so they can be incorporated into future releases.

DRS User-Space Applications

For some, installing DRS in the kernel can be a little daunting. As an interim solution until vendors install DRS by default, we are also providing DRS in user-space applications.

Unlike the kernel version of DRS, the information necessary to implement DRS is not directly available to user-space applications and hence must be synthesized. As a result, DRS in kernel-space performs better than DRS in user-space. Even so, DRS in user-space provides dramatic performance improvement over traditional applications.

The next disadvantage of DRS in user-space is that each application (or set of related applications) require modification to implement the DRS algorithm. The following applications have been modified to support DRS.

  • FTP client and server (Coming soon.)

Besides the applications themselves, the maximum buffer space Linux allows will need to be increased so DRS has room to work.


DRS Kernel-Space Installation Instructions

  • Download an official kernel release from www.kernel.org or your favorite mirror. (Source packages from some Linux distributions, such as Debian, also work.)
  • Uncompress and untar the kernel source in an appropriate location (usually /usr/src).
  • Download the appropriate DRS patch.
  • Patch the kernel by entering the kernel source directory and typing the command
    patch -p1 < [path to DRS
    patch]
    .
  • Configure, build and install the kernel in the usual way. (For an example, see the MAGNET distribution page.)
  • Increase the maximum buffer limits so DRS has room to work.

Tuning Linux Buffer Limits

Besides installing DRS, you'll want to tune the maximum buffer sizes. (Do not change the minimum and default sizes.) This is done by writing to the /proc file system as root:

echo 6553500                 > /proc/sys/net/core/wmem_max
echo 6553500 > /proc/sys/net/core/rmem_max
echo 4096 16384 6553500 > /proc/sys/net/ipv4/tcp_wmem
echo 8192 87380 6553500 > /proc/sys/net/ipv4/tcp_rmem
echo 6553500 6653500 6753500 > /proc/sys/net/ipv4/tcp_mem

Data Compression

Data Compression

D.A. Lelewer and D.S. Hirschberg, Data compression, Computing Surveys 19,3 (1987) 261-297. Reprinted in Japanese BIT Special issue in Computer Science (1989) 165-195. (in HTML)

Debra A. Lelewer and Daniel S. Hirschberg

Table of Contents

Abstract
INTRODUCTION
1. FUNDAMENTAL CONCEPTS
1.1 Definitions
1.2 Classification of Methods
1.3 A Data Compression Model
1.4 Motivation
2. SEMANTIC DEPENDENT METHODS
3. STATIC DEFINED-WORD SCHEMES
3.1 Shannon-Fano Coding
3.2 Static Huffman Coding
3.3 Universal Codes and Representations of the Integers
3.4 Arithmetic Coding
4. ADAPTIVE HUFFMAN CODING
4.1 Algorithm FGK
4.2 Algorithm V
5. OTHER ADAPTIVE METHODS
5.1 Lempel-Ziv Codes
5.2 Algorithm BSTW
6. EMPIRICAL RESULTS
7. SUSCEPTIBILITY TO ERROR
7.1 Static Codes
7.2 Adaptive Codes
8. NEW DIRECTIONS
9. SUMMARY
REFERENCES

Abstract

This paper surveys a variety of data compression methods spanning almost forty years of research, from the work of Shannon, Fano and Huffman in the late 40's to a technique developed in 1986. The aim of data compression is to reduce redundancy in stored or communicated data, thus increasing effective data density. Data compression has important application in the areas of file storage and distributed systems.

Concepts from information theory, as they relate to the goals and evaluation of data compression methods, are discussed briefly. A framework for evaluation and comparison of methods is constructed and applied to the algorithms presented. Comparisons of both theoretical and empirical natures are reported and possibilities for future research are suggested.

INTRODUCTION

Data compression is often referred to as coding, where coding is a very general term encompassing any special representation of data which satisfies a given need. Information theory is defined to be the study of efficient coding and its consequences, in the form of speed of transmission and probability of error [Ingels 1971]. Data compression may be viewed as a branch of information theory in which the primary objective is to minimize the amount of data to be transmitted. The purpose of this paper is to present and analyze a variety of data compression algorithms.

A simple characterization of data compression is that it involves transforming a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. Data compression has important application in the areas of data transmission and data storage. Many data processing applications require storage of large volumes of data, and the number of such applications is constantly increasing as the use of computers extends to new disciplines. At the same time, the proliferation of computer communication networks is resulting in massive transfer of data over communication links. Compressing data to be stored or transmitted reduces storage and/or communication costs. When the amount of data to be transmitted is reduced, the effect is that of increasing the capacity of the communication channel. Similarly, compressing a file to half of its original size is equivalent to doubling the capacity of the storage medium. It may then become feasible to store the data at a higher, thus faster, level of the storage hierarchy and reduce the load on the input/output channels of the computer system.

Many of the methods to be discussed in this paper are implemented in production systems. The UNIX utilities compact and compress are based on methods to be discussed in Sections 4 and 5 respectively [UNIX 1984]. Popular file archival systems such as ARC and PKARC employ techniques presented in Sections 3 and 5 [ARC 1986; PKARC 1987]. The savings achieved by data compression can be dramatic; reduction as high as 80% is not uncommon [Reghbati 1981]. Typical values of compression provided by compact are: text (38%), Pascal source (43%), C source (36%) and binary (19%). Compress generally achieves better compression (50-60% for text such as source code and English), and takes less time to compute [UNIX 1984]. Arithmetic coding (Section 3.4) has been reported to reduce a file to anywhere from 12.1 to 73.5% of its original size [Witten et al. 1987]. Cormack reports that data compression programs based on Huffman coding (Section 3.2) reduced the size of a large student-record database by 42.1% when only some of the information was compressed. As a consequence of this size reduction, the number of disk operations required to load the database was reduced by 32.7% [Cormack 1985]. Data compression routines developed with specific applications in mind have achieved compression factors as high as 98% [Severance 1983].

While coding for purposes of data security (cryptography) and codes which guarantee a certain level of data integrity (error detection/correction) are topics worthy of attention, these do not fall under the umbrella of data compression. With the exception of a brief discussion of the susceptibility to error of the methods surveyed (Section 7), a discrete noiseless channel is assumed. That is, we assume a system in which a sequence of symbols chosen from a finite alphabet can be transmitted from one point to another without the possibility of error. Of course, the coding schemes described here may be combined with data security or error correcting codes.

Much of the available literature on data compression approaches the topic from the point of view of data transmission. As noted earlier, data compression is of value in data storage as well. Although this discussion will be framed in the terminology of data transmission, compression and decompression of data files for storage is essentially the same task as sending and receiving compressed data over a communication channel. The focus of this paper is on algorithms for data compression; it does not deal with hardware aspects of data transmission. The reader is referred to Cappellini for a discussion of techniques with natural hardware implementation [Cappellini 1985].

Background concepts in the form of terminology and a model for the study of data compression are provided in Section 1. Applications of data compression are also discussed in Section 1, to provide motivation for the material which follows.

While the primary focus of this survey is data compression methods of general utility, Section 2 includes examples from the literature in which ingenuity applied to domain-specific problems has yielded interesting coding techniques. These techniques are referred to as semantic dependent since they are designed to exploit the context and semantics of the data to achieve redundancy reduction. Semantic dependent techniques include the use of quadtrees, run length encoding, or difference mapping for storage and transmission of image data [Gonzalez and Wintz 1977; Samet 1984].

General-purpose techniques, which assume no knowledge of the information content of the data, are described in Sections 3-5. These descriptions are sufficiently detailed to provide an understanding of the techniques. The reader will need to consult the references for implementation details. In most cases, only worst-case analyses of the methods are feasible. To provide a more realistic picture of their effectiveness, empirical data is presented in Section 6. The susceptibility to error of the algorithms surveyed is discussed in Section 7 and possible directions for future research are considered in Section 8.

Research on TCP tuning - buffer size - parallel streams

Research on TCP tuning - buffer size - parallel streams


Tom Dunigan's Home Page

http://www.csm.ornl.gov/~dunigan/


TCP auto-tuning (buffer size)
http://www.csm.ornl.gov/~dunigan/netperf/auto.html

  • Dynamic right-size (DRS)
  • Net100
  • Web100 research (using latency bandwidth to calculate buffer size)
  • Automatic tcp buffer tuning
  • Linux 2.4 and auto tuning
  • slow start
  • WAD *

Parallel streams
http://www.csm.ornl.gov/~dunigan/netperf/parallel.html



LANL's Weigle's A Comparison of TCP Automatic Tuning Techniques for Distributed Computing 2002 (Feng's paper)

Friday, July 11, 2008

Stream Computing: A New Paradigm To Gain Insight and Value

www.almaden.ibm.com/institute/resources/2008/presentations/Halim.ppt


HPCwire >> Features

IBM Looks to Tap Massive Data Streams

...
And out of the box it will come. Although this is still a research project and there remains much work to be done before the product is shrink wrapped, Halim and his team are motivated by an expansive view that "stream computing is not just a new computing model, it is a new scientific instrument."

Thursday, July 10, 2008

parallel NetCDF (scoop's format) ve FastBit

http://www.scidacreview.org/0602/html/data.html

Fast Bit : Indexing for Fast Searches

In data mining and analyses, the process of quickly isolating important information from much larger pools of data is critical. FastBit is a software package capable of extremely fast searches of large databases. During a series of head-to-head trials, FastBit considerably outperformed a leading database management system. Searching a dataset composed of 250,000 email messages, FastBit handled queries between ten and one thousand times faster than the popular commercial software.
Bitmaps are sequences of bits, basic yes/no units of information represented by 1 or 0, a computationally practical representation. A bitmap index is a set of bit sequences that represent information about certain indexed attributes. Because FastBit uses bitmap indices, user queries can be addressed by bitwise logical operations, which computer hardware systems generally handle quite efficiently. However, scientific applications often involve indices containing information about a large number of bitmaps, and such bitmap indices demand impractical storage requirements. Schemes that compress index files can reduce space requirements, but compression can also slow down search methods. To maximize FastBit performance, researchers had to optimize this tradeoff between storage space and speed. Using the Word-Aligned Hybrid (WAH) compression method, FastBit achieves this functional balance. Bitmap indices compressed by the WAH scheme are a little larger than indices compressed by other methods, but WAHcompressed indices can be queried much faster because they can be searched without being fully decompressed.
SDM researchers at LBNL developed both the FastBit software package and the WAH compression scheme it employs. A number of SciDAC-supported projects, including the STAR experiment (sidebar, p35) and combustion research (figure 3, p32), have benefited from the impressive power of FastBit.