sparse

Trim Null Bytes from a Sparse File

I was running a server that didn’t do log rotation, so when the log grew too big I had to truncate it. This left me with a sparse file with a bunch of 0s (null bytes) at the beginning.

To trim the null bytes, first find the offset of the first non-null byte. This can be done with some C:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <unistd.h>
#include <fcntl.h>
#define BUF_SIZE 4 * 1024
int main(int argc, char *argv[]) {
char buf[BUF_SIZE];
unsigned long long count;
int n, i, fd;
if (argc != 2) {
fprintf(stderr, "Usage: %s FILE\n", argv[0]);
exit(1);
}
fd = open(argv[1], O_RDONLY);
if (fd < 0) {
fprintf(stderr, "Error opening file: %s: %s\n", argv[1], strerror(errno));
exit(1);
}
count = 0;
while ((n = read(fd, buf, BUF_SIZE)) > 0) {
for (i = 0; i < n; i++) if (buf[i] != 0) break;
count += i;
if (i != n) break;
}
if (n < 0) {
fprintf(stderr, "Error reading file: %s: %s\n", argv[1], strerror(errno));
exit(1);
}
printf("%ld\n", count);
return 0;
}

Find the offset of the first non-null byte:

1
2
3
gcc -o nulllength nulllength.c
./nulllength log.bin
7981307584

Trim the first 7981307584 bytes from the file:

1
dd if=log.bin of=trimmed.bin bs=7981307584 skip=1

Specifying the bs as offset and skipping 1 is faster than doing it the other way, because it would copy 1 byte at a time — slow!

Efficient Representation of a Sparse Array in JavaScript

I’m using an array to represent ranges of numbers. For example, in this array,

1
[1, 2, 3, 4, 15, 16, 17, 18]

there are two sequences of consequitive numbers: 1-4 and 15-18. In order to save space when serializing this array, it can be represented as ranges of numbers. For example:

1
[ [1, 4], [15, 18] ]

This function takes in an array of numbers, and returns an array of ranges:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
function sparsify(l) {
var i, j, k;
var ranges = []; // [ [from, to], [from, to], ... ]
function insertNum(n) {
for (j = 0; j < ranges.length; j++) {
if (ranges[j][0] <= n && n <= ranges[j][1]) return; // already in a range
if (n === ranges[j][0] - 1) // immediately before a range
ranges[j][0] = n;
else if (n === ranges[j][1] + 1) // immediately after a range
ranges[j][1] = n;
else
continue;
for (k = 0; k < ranges.length; k++) { // merge ranges
if (j === k) continue;
// do ranges j and k overlap?
if ( (ranges[j][0] <= ranges[k][0] && ranges[k][0] <= ranges[j][1] + 1) ||
(ranges[j][0] - 1 <= ranges[k][1] && ranges[k][1] <= ranges[j][1])) {
ranges[j] = [Math.min(ranges[j][0], ranges[k][0]), Math.max(ranges[j][1], ranges[k][1])];
ranges.splice(k, 1);
break;
}
}
return;
}
ranges.push([n, n]); // not found; add new range
}
for (i = 0; i < l.length; i++)
insertNum(l[i]);
return ranges;
}