Company logo
  • Jobs
  • Bootcamp
  • About Us
  • For professionals
    • Home
    • Jobs
    • Courses
    • Questions
    • Teachers
    • Bootcamp
  • For business
    • Home
    • Our process
    • Plans
    • Assessments
    • Payroll
    • Blog
    • Calculator

0

79
Views
Segment large text file and round robin for item in each segment

I have very large txt file type of '{some_string}|{repeated_value}', more than 50 millions lines.

2312|dog
4214215|cat
42141241|dog
fasfsa|cat
4214214|bird
42142141|cat
fasfsa|bird
fsafasf|dog
421jdsa|tiger

I need to segment than by it's value after "|" symbol. Somethings like this.

2312|dog
42141241|dog
fsafasf|dog

4214215|cat
fasfsa|cat
42142141|cat

4214214|bird
fasfsa|bird

42jdsa|tiger

Problem: Need to take one or n elements from each segment and feed to function until there 0 elements in each segment.(Round Robin)

Python example:

#Segmented lines
dog = ['2312|dog', '42141241|dog', 'fsafasf|dog', 'dsafas|dog']
cat = ['4214215|cat', 'fasfsa|cat', '42142141|cat']
bird = ['4214214|bird', '4214214|bird']
tiger = ['442jssa|tiger']

animals = [dog, cat, bird, tiger]

# Round Robin
while len(animals):

    animal_list = animals.pop(0)

    if len(animal_list) == 0:
        continue

    line = animal_list.pop(0)
    print(line)

    animals.append(animal_list)

JS example:

const dog = ['2312|dog', '42141241|dog', 'fsafasf|dog', 'dsafas|dog']
const cat = ['4214215|cat', 'fasfsa|cat', '42142141|cat']
const bird = ['4214214|bird', '4214214|bird']
const tiger = ['442jssa|tiger']

const animals = [dog, cat, bird, tiger]

// Round Robin
while (animals.length) {
  let animal_list = animals.pop(0)

  if (animal_list.length == 0) {
    continue
  }

  let line = animal_list.pop(0)
  console.log(line)

  animals.push(animal_list)
}

Output:

2312|dog
4214215|cat
4214214|bird
442jssa|tiger
42141241|dog
fasfsa|cat
4214214|bird
fsafasf|dog
42142141|cat
dsafas|dog

I can't load all the lines into memory, there are thousands of "animals".

I came up with only one solution, read file line by line, for each animal create table in db. Create worker that would be fill up "animals" list when len < LIMIT, and implement round robin. Any other solution?

7 months ago ยท Juan Pablo Isaza
Answer question
Find remote jobs