I have very large txt file type of '{some_string}|{repeated_value}', more than 50 millions lines.
2312|dog
4214215|cat
42141241|dog
fasfsa|cat
4214214|bird
42142141|cat
fasfsa|bird
fsafasf|dog
421jdsa|tiger
I need to segment than by it's value after "|" symbol. Somethings like this.
2312|dog
42141241|dog
fsafasf|dog
4214215|cat
fasfsa|cat
42142141|cat
4214214|bird
fasfsa|bird
42jdsa|tiger
Problem: Need to take one or n elements from each segment and feed to function until there 0 elements in each segment.(Round Robin)
Python example:
#Segmented lines
dog = ['2312|dog', '42141241|dog', 'fsafasf|dog', 'dsafas|dog']
cat = ['4214215|cat', 'fasfsa|cat', '42142141|cat']
bird = ['4214214|bird', '4214214|bird']
tiger = ['442jssa|tiger']
animals = [dog, cat, bird, tiger]
# Round Robin
while len(animals):
animal_list = animals.pop(0)
if len(animal_list) == 0:
continue
line = animal_list.pop(0)
print(line)
animals.append(animal_list)
JS example:
const dog = ['2312|dog', '42141241|dog', 'fsafasf|dog', 'dsafas|dog']
const cat = ['4214215|cat', 'fasfsa|cat', '42142141|cat']
const bird = ['4214214|bird', '4214214|bird']
const tiger = ['442jssa|tiger']
const animals = [dog, cat, bird, tiger]
// Round Robin
while (animals.length) {
let animal_list = animals.pop(0)
if (animal_list.length == 0) {
continue
}
let line = animal_list.pop(0)
console.log(line)
animals.push(animal_list)
}
Output:
2312|dog
4214215|cat
4214214|bird
442jssa|tiger
42141241|dog
fasfsa|cat
4214214|bird
fsafasf|dog
42142141|cat
dsafas|dog
I can't load all the lines into memory, there are thousands of "animals".
I came up with only one solution, read file line by line, for each animal create table in db. Create worker that would be fill up "animals" list when len < LIMIT, and implement round robin. Any other solution?