Thursday, July 10, 2008

parallel NetCDF (scoop's format) ve FastBit

http://www.scidacreview.org/0602/html/data.html

Fast Bit : Indexing for Fast Searches

In data mining and analyses, the process of quickly isolating important information from much larger pools of data is critical. FastBit is a software package capable of extremely fast searches of large databases. During a series of head-to-head trials, FastBit considerably outperformed a leading database management system. Searching a dataset composed of 250,000 email messages, FastBit handled queries between ten and one thousand times faster than the popular commercial software.
Bitmaps are sequences of bits, basic yes/no units of information represented by 1 or 0, a computationally practical representation. A bitmap index is a set of bit sequences that represent information about certain indexed attributes. Because FastBit uses bitmap indices, user queries can be addressed by bitwise logical operations, which computer hardware systems generally handle quite efficiently. However, scientific applications often involve indices containing information about a large number of bitmaps, and such bitmap indices demand impractical storage requirements. Schemes that compress index files can reduce space requirements, but compression can also slow down search methods. To maximize FastBit performance, researchers had to optimize this tradeoff between storage space and speed. Using the Word-Aligned Hybrid (WAH) compression method, FastBit achieves this functional balance. Bitmap indices compressed by the WAH scheme are a little larger than indices compressed by other methods, but WAHcompressed indices can be queried much faster because they can be searched without being fully decompressed.
SDM researchers at LBNL developed both the FastBit software package and the WAH compression scheme it employs. A number of SciDAC-supported projects, including the STAR experiment (sidebar, p35) and combustion research (figure 3, p32), have benefited from the impressive power of FastBit.

No comments: