It is the first time I am using MongoDB and I don't know how to create references (relationships in SQL) between documents which are already inserted in my database.
I have two collections: the first one is called Films, and its documents have information about films (title, a unique URL, description...). Here is an example of a document:
{
"_id": {
"$oid": "6272aa886441c51b18de7b23"
},
"type": "Film",
"title": "Les 2 papas et la maman",
"genres": "Comedia",
"description": "Jérôme and Delphine want a child but Jerome is sterile. They then ask the best friend of Jerome, Salim, to be the donor for artificial insemination of the mother...",
"platform": "Netflix"
"film_url": "exampleurl.com"
}
Also, there is another collection called "Actors". Every document in the "Actors" collection has specific information about a certain Actor, the film in which he/she participates (title and a unique URL) and the character that he/she represents. A document of this collection could be the next one:
{
"_id": {
"$oid": "6272ac5b6441c51b18de9ee4"
},
"name": "Sophie Rundle",
"film_title": "Peaky Blinders",
"film_url": "exampleurl.com",
"character": "Ada Shelby",
"num_episodes": "36"
}
I want to create a OneToMany reference between the Collection Films and the collection Actors (a film has many actors, and one actor represents a character in a film), creating an array inside each Film document, which contains the ids of those actors who participate in a certain film. To do that, I have the unique field "film_url" in both collections and I have two CSV files with the data, so I could read and iterate over them to create the references, but it isn't a good idea in terms of efficiency, since each file has more than 10,000 lines.
Is there a simpler and more efficient way to create these references in MongoDB?
Here's one way you could possibly add and populate an "actorIds"
array in the Films
collection. If your collections are large, I expect this will take some time since each film will need to get its actors. Indexes will probably be useful.
Before using this, I would evaluate MongoDB's warning about "output to the same collection that is being aggregated". Infinite loops are not fun, but I doubt that would be an issue with this pipeline. You should make your own decision about using this - it's your responsibility (not mine!). Expert analysis/opinions are welcome.
db.Films.aggregate([
{
"$lookup": {
"from": "Actors",
"localField": "film_url",
"foreignField": "film_url",
"pipeline": [
{
"$project": {
"_id": 1
}
}
],
"as": "actorIds"
}
},
{
"$set": {
"actorIds": {
"$map": {
"input": "$actorIds",
"in": "$$this._id"
}
}
}
},
{
"$merge": "Films"
}
])
Try it on mongoplayground.net.