• Home
  • Jobs
  • Courses
  • Teachers
  • For business
  • Blog
  • ES/EN

0

18
Views
Mongo .find() returning duplicate documents (with same _id) (!)

Mongo appears to be returning duplicate documents for the same query, i.e. it returns more documents than there are unique _ids in the returned documents:

lobby-brain> count_iterated = 0; ids = {}
{}
lobby-brain> db.the_collection.find({
    'a_boolean_key': true 
}).forEach((el) => {
    count_iterated += 1; 
    ids[el._id] = (ids[el._id]||0) + 1;
})
lobby-brain> count_iterated
278
lobby-brain> Object.keys(ids).length
251

That is, the number of unique _id returned is 251 -- but there were 278 documents returned by the cursor.

Investigating further:

lobby-brain> ids
{
  '60cb8cb92c909a974a96a430': 1,
  '61114dea1a13c86146729f21': 1,
  '6111513a1a13c861467d3dcf': 1,
  ...
  '61114c491a13c861466d39cf': 2,
  '61114bcc1a13c861466b9f8e': 2,
  ...
}
lobby-brain> db.the_collection.find({
    _id: ObjectId("61114c491a13c861466d39cf")
}).forEach((el) => print("foo"));
foo

> 

That is, there aren't actually duplicate documents with the same _id -- it's just an issue with the .find().

I tried restarting the database, and rebuilding an index involving 'a_boolean_key', with the same results.

I've never seen this before and this seems impossible... what is causing this and how can I fix it?

Version info:

Using MongoDB:          5.0.5
Using Mongosh:          1.0.4

It is a stand-alone database, no replica set or sharding or anything like that.

Further Info

One thing to note is, there is a compound index with a_boolean_key as the first index, and a datetime field as the second. The boolean key is rarely updated on the database (~once/day), but the datetime field is frequently updated.

Maybe these updates are causing the duplicate return values?

Update Feb 15, 2022: I added a Mongo JIRA task here.

13 days ago ·

Santiago Trujillo

2 answers
Answer question

0

Try checking if you store indexes for a_boolean_key field.

When performing a count, MongoDB can return the count using only the index

So, maybe you don't have indexes for all documents, so count method result is not equal to your manual count.

13 days ago · Santiago Trujillo Report

0

According to Louis Williams over at Mongo JIRA, this is not a bug but expected behavior.

Learn something new every day!

13 days ago · Santiago Trujillo Report
Answer question
Find remote jobs
Loading

Discover the new way to find a job!

Top jobs
Top job categories
Business
Post job Plans Our process Sales
Legal
Terms and conditions Privacy policy
© 2022 PeakU Inc. All Rights Reserved.