Distributed Checksum Calculations for KORA
IEver wonder how to detect errors in your data? At MSU's MATRIX Center for Humane, Arts, and Sciences this is a critical issue as they are building a digital archive called Kora that stores a large amount of multimedia data. By performing some simple logical arithmetic on a file you can obtain a unique value called a checksum to use in checking for errors.
What happens if we have lots of data? This is where the power of distributed computing comes in. Distributed computing is just another name for using multiple computers to work on different parts of a problem. Each file is a different part of the whole problem: detecting errors in the Kora archive.
How does this work? The system consists of 2 main software components: a server program which retrieves a list of the files that make up the Kora archive and assigns these files one by one to client programs. Each client program will retrieve the file that its currently been assigned, perform the checksum computation, and return this checksum to the server. This is repeated until all the files in the archive have been checksummed. The server will then match the recently computed checksum for each file against a database of checksums. If the checksums don't match, the data has changed, which indicates an error.
The system was written in Python using the Parallel Python library and is designed to interface with a MySQL database.
Dustin Manning, Jared Wein, Chung-Hi Kim, Chris Samiadji-Benthin