I want to parse/process a 25 MB JSON file using Typescript and filter out/sort the objects .. The code I wrote is taking minutes (and sometimes timeouts) not sure why is this happening or if there's another way to make the code more efficient.
Note: the code worked on a small file
import fs from 'fs';
searchAccounts(): Promise<Account[]> {
const accountSearchCriteria: AccountSearchCriteria = {
country: 'NZ',
mfa: 'SMS',
name: 'TEST',
sortField: 'dob'
};
const jsonPath = './src/file.json';
const rawAccounts = fs.readFileSync(jsonPath, 'utf-8');
let accounts: Account[] = JSON.parse(rawAccounts);
if (accountSearchCriteria) {
if (accountSearchCriteria.name) {
accounts = accounts.filter(
account =>
account.firstName.toLowerCase() ===
accountSearchCriteria.name.toLowerCase() ||
account.lastName.toLowerCase() ===
accountSearchCriteria.name.toLowerCase()
);
}
if (accountSearchCriteria.country) {
accounts = accounts.filter(
account =>
account.country.toLowerCase() ===
accountSearchCriteria.country.toLowerCase()
);
}
if (accountSearchCriteria.mfa) {
accounts = accounts.filter(
account => account.mfa === accountSearchCriteria.mfa
);
}
if (accountSearchCriteria.sortField) {
accounts.sort((a, b) => {
return a[accountSearchCriteria.sortField] <
b[accountSearchCriteria.sortField]
? -1
: 1;
});
}
return accounts;
}
return accounts;
}
Since your data size is 25 MB, you should use a more memory-efficient sorting algorithm.
You can try to use cycle sort.
cycle-sort you can find an implementation here and use it in your code to see if there is a difference.
Node.js is single-threaded if your code blocking the thread for a long time it will give you a timeout error. There are two problems with your code.
fs.readFileSync(jsonPath, 'utf-8');
, it is an asynchronous function and blocks the thread while reading the file. Use instead fs.readFile('./Index.html', callback)
:const fs = require('fs');
fs.readFile('./Index.html', function read(err, data) {
if (err) {
throw err;
}
console.log(data);
});
Note: Node.js is not good with CPU-centric tasks i.e sorting, image processing, etc. It's good with I/O tasks.