I'm downloading this AWS S3 object on my local Node.js
server with this -
var url = "http://s3.amazonaws.com/cloudfront.s3post.cf/s3posts.json.gz";
var dest = "./s3posts.json.gz";
var download = function(url, dest, cb) {
var file = fs.createWriteStream(dest);
var request = http.get(url, function(response) {
response.pipe(file);
file.on('finish', function() {
file.close(cb);
});
});
}
download(url, dest, function() {
console.log('Download complete');
});
This successfully downloads a .json.gz
object. I'm trying to unzip this object using zlib
-
var gunzip = zlib.createGunzip();
var rstream = fs.createReadStream('./s3posts.json.gz');
var wstream = fs.createWriteStream('./s3posts.json');
rstream.pipe(gunzip).pipe(wstream);
However, this throws an error and the .json
file that is created is empty -
events.js:163
throw er; // Unhandled 'error' event
^
Error: unexpected end of file
at Zlib._handle.onerror (zlib.js:355:17)
Weirdly, if I use only the download code to download the object and unzip it manually using gunzip s3posts.json.gz
on the terminal, the created json file is filled with content and I can run my app successfully.
I'm not sure why I'm able to unzip manually but can't do it programmatically with zlib
. It would be really helpful if someone can point out if I'm making a mistake.
The S3 object has the following metadata if it's relevant -
Cache-Control: max-age=31536000,no-transform,public
Content-Encoding: gzip
Content-Type: application/json
What happened is that you don't check for errors when you download the gzipped file so you end up saving an empty file. Then you try to decompress the empty file and you get an error, which is also unhandled and your program crashes.
Just handle all errors and you know what went wrong. From your example it's impossible to tell you anything more than that the .gz file is likely empty because apparently something went wrong with the download. But want exactly went wrong is a mystery because you don't check for errors in your code.
Turns out that I wasn't waiting for the file to complete downloading before unzipping it; which was why the generated json
was empty. I had to use the code for unzipping the file as a callback function -
download(url, dest, function() {
console.log('Download complete');
var gunzip = zlib.createGunzip();
var rstream = fs.createReadStream('./s3posts.json.gz');
var wstream = fs.createWriteStream('./s3posts.json');
rstream.pipe(gunzip).pipe(wstream);
});
This also explains why I was able to unzip the file manually just using the download code, as the download was complete at that point allowing me to successfully unzip it on the terminal.