i have big file json array(~8gb). need split group of small files each contains part of array.
the array contains objects.
i decided implement algorithm:
- read file symbol
- add symbols buffer
- try parse buffer json-object.
- if parses write object file
- when file achieves size, change file
i tried implement myself finish this:
var fs = require('fs'); readable = fs.createreadstream("walmart.dump", { encoding: 'utf8', fd: null, }); var chunk, buffer = '', counter=0; readable.on('readable', function() { readable.read(1); while (null !== (chunk = readable.read(1))) { buffer += chunk; // chunk 1 symbol console.log(buffer.length); if (chunk !== '}') continue; try { var res = json.parse(buffer); console.log(res); readable.read(1); readable.read(1); readable.read(1); //array.apply(null, {length: 10}).map(function(){return readable.read(1)}); buffer = '{'; } catch(e) { } } })
did resolve similar problem?
clarinet module (https://github.com/dscape/clarinet) looks quite promising me. it's based on sax-js should quite robust , tested.
Comments
Post a Comment