Mongo appears to be returning duplicate documents for the same query, i.e. it returns more documents than there are unique _id
s in the returned documents:
lobby-brain> count_iterated = 0; ids = {}
{}
lobby-brain> db.the_collection.find({
'a_boolean_key': true
}).forEach((el) => {
count_iterated += 1;
ids[el._id] = (ids[el._id]||0) + 1;
})
lobby-brain> count_iterated
278
lobby-brain> Object.keys(ids).length
251
That is, the number of unique _id returned is 251 -- but there were 278 documents returned by the cursor.
Investigating further:
lobby-brain> ids
{
'60cb8cb92c909a974a96a430': 1,
'61114dea1a13c86146729f21': 1,
'6111513a1a13c861467d3dcf': 1,
...
'61114c491a13c861466d39cf': 2,
'61114bcc1a13c861466b9f8e': 2,
...
}
lobby-brain> db.the_collection.find({
_id: ObjectId("61114c491a13c861466d39cf")
}).forEach((el) => print("foo"));
foo
>
That is, there aren't actually duplicate documents with the same _id
-- it's just an issue with the .find()
.
I tried restarting the database, and rebuilding an index involving 'a_boolean_key', with the same results.
I've never seen this before and this seems impossible... what is causing this and how can I fix it?
Version info:
Using MongoDB: 5.0.5
Using Mongosh: 1.0.4
It is a stand-alone database, no replica set or sharding or anything like that.
Further Info
One thing to note is, there is a compound index with a_boolean_key as the first index, and a datetime field as the second. The boolean key is rarely updated on the database (~once/day), but the datetime field is frequently updated.
Maybe these updates are causing the duplicate return values?
Update Feb 15, 2022: I added a Mongo JIRA task here.
Try checking if you store indexes for a_boolean_key
field.
When performing a
count
, MongoDB can return the count using only the index
So, maybe you don't have indexes for all documents, so count
method result is not equal to your manual count.
According to Louis Williams over at Mongo JIRA, this is not a bug but expected behavior.
Learn something new every day!