I have the following scenario in MongoDB:
Every record has its own _id and a parentId. If a parentId== "" then it's a true parent record. If the parentId has a value then that record is actually a child pointing to the parent record. The following shows one parent and its linked child.
{"_id": ObjectId('586c9d275d2f62e1634978db'), parentId="", count=1, <other fields>}
{"_id": ObjectId('586c9d275d2f62e163497811'), parentId=ObjectId('586c9d275d2f62e1634978db'), count=3, <other fields>}
I wanted a query where by I find all the parent records sorted by the count field where all parents and the child records are grouped together. For example, easiest explained via diagram:
ID6 has the highest count value which is associated to parent ID5. The next highest count is ID2 which is associated to parent ID1 finally ID4 is a parent and should also be returned so the results should be:
ID5, ID1, ID4
HoefMeistert helped me come up with the following query:
db.collection.aggregate(
[
{
$project: {
group_id : { $cond : { if: { $ne: [ "$parentId", "" ] }, then: "$parentId", else: "$_id" }},
count :1,
field1:1,
field2:1
}
},
{
$group: {
_id : "$group_id",
highest : { $max: "$count" }
},
"field1":{"$first":"$field1"},
"field2":{"$first":"$field2"},
},
{
$sort: {
highest : -1
}
}
]
);
The problem with this query is that it doesn't return the field1 and field2 associated to the parents I.e. ID1 and ID5 in the diagram. Is there a way to project the correct fields associated to the parents at the group stage? Otherwise if the group stage returns something like:
{'_id': ObjectId('586c9d275d2f62e1634978db'), 'highest': 2}
{'_id': ObjectId('586c9d0d5d2f62e1634978d5'), 'highest': 1}
{'_id': ObjectId('586c9d365d2f62e1634978e3'), 'highest': 0}
How can I rematch after the group to pull back the whole records for all the Ids above? I.e. 586c9d275d2f62e1634978db, 586c9d0d5d2f62e1634978d5, 586c9d365d2f62e1634978e3 ??
Your query has an error, field1
and field2
need to be inside the $group
dict:
db.collection.aggregate([
{
$project: {
group_id: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: "$parentId", else: "$_id" }},
count: 1,
field1: 1,
field2: 1
}
},
{
$group: {
_id: "$group_id",
highest: { $max: "$count"},
field1: { "$first": "$field1"},
field2: { "$first":" $field2"},
},
},
{
$sort: {
highest : -1
}
}
]);
Result based on your diagram:
{ "_id" : "5", "highest" : 5, "field1" : ..., "field2" : ... }
{ "_id" : "1", "highest" : 3, "field1" : ..., "field2" : ... }
{ "_id" : "4", "highest" : 1, "field1" : ..., "field2" : ... }
Edit:
db.collection.aggregate([ { $project: { group_id: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: "$parentId", else: "$_id" }}, count: 1, field1: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: null, else: "$field1" }}, field2: { $cond: { if: { $ne: [ "$parentId", "" ] }, then: null, else: "$field2" }}, } }, { $group: { _id: "$group_id", highest: { $max: "$count"}, field1: { "$max": "$field1"}, field2: { "$max":"$field2"}, }, }, { $sort: { highest : -1 } } ]);
With this edit, in the group stage only the parents will have values for field1
, field2
, other documents will have null
values. Than we can do a $max
of them, to get the only value, the parent value.
Result will be the same like above, and field1
, field2
will have values from parent documents