mongodb - Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size -


i have pretty simple $lookup aggregation query following:

{'$lookup':  {'from': 'edge',   'localfield': 'gid',   'foreignfield': 'to',   'as': 'from'}} 

when run on match enough documents following error:

command failed error 4568: 'total size of documents in edge matching { $match: { $and: [ { from: { $eq: "genedatabase:hugo" } }, {} ] } } exceeds maximum document size' on server 

all attempts limit number of documents fail. allowdiskuse: true nothing. sending cursor in nothing. adding in $limit aggregation fails.

how be?

then see error again. did $match , $and , $eq come from? aggregation pipeline behind scenes farming out $lookup call aggregation, 1 runs on own have no ability provide limits or use cursors with??

what going on here?

as stated earlier in comment, error occurs because when performing $lookup default produces target "array" within parent document results of foreign collection, total size of documents selected array causes parent exceed 16mb bson limit.

the counter process $unwind follows $lookup pipeline stage. alters behavior of $lookup in such instead of producing array in parent, results instead "copy" of each parent every document matched.

pretty regular usage of $unwind, exception instead of processing "separate" pipeline stage, unwinding action added $lookup pipeline operation itself. ideally follow $unwind $match condition, creates matching argument added $lookup. can see in explain output pipeline.

the topic covered (briefly) in section of aggregation pipeline optimization in core documentation:

$lookup + $unwind coalescence

new in version 3.2.

when $unwind follows $lookup, , $unwind operates on field of $lookup, optimizer can coalesce $unwind $lookup stage. avoids creating large intermediate documents.

best demonstrated listing puts server under stress creating "related" documents exceed 16mb bson limit. done briefly possible both break , work around bson limit:

const mongoclient = require('mongodb').mongoclient;  const uri = 'mongodb://localhost/test';  function data(data) {   console.log(json.stringify(data, undefined, 2)) }  (async function() {    let db;    try {     db = await mongoclient.connect(uri);      console.log('cleaning....');     // clean data     await promise.all(       ["source","edge"].map(c => db.collection(c).remove() )     );      console.log('inserting...')      await db.collection('edge').insertmany(       array(1000).fill(1).map((e,i) => ({ _id: i+1, gid: 1 }))     );     await db.collection('source').insert({ _id: 1 })      console.log('fattening up....');     await db.collection('edge').updatemany(       {},       { $set: { data: "x".repeat(100000) } }     );      // full pipeline. failing test uses $lookup stage     let pipeline = [       { $lookup: {         from: 'edge',         localfield: '_id',         foreignfield: 'gid',         as: 'results'       }},       { $unwind: '$results' },       { $match: { 'results._id': { $gte: 1, $lte: 5 } } },       { $project: { 'results.data': 0 } },       { $group: { _id: '$_id', results: { $push: '$results' } } }     ];      // list , iterate each test case     let tests = [       'failing.. size exceeded...',       'working.. applied $unwind...',       'explain output...'     ];      (let [idx, test] of object.entries(tests)) {       console.log(test);        try {         let currpipe = (( +idx === 0 ) ? pipeline.slice(0,1) : pipeline),             options = (( +idx === tests.length-1 ) ? { explain: true } : {});          await new promise((end,error) => {           let cursor = db.collection('source').aggregate(currpipe,options);           ( let [key, value] of object.entries({ error, end, data }) )             cursor.on(key,value);         });       } catch(e) {         console.error(e);       }      }    } catch(e) {     console.error(e);   } {     db.close();   }  })(); 

after inserting initial data, listing attempt run aggregate merely consisting of $lookup fail following error:

{ mongoerror: total size of documents in edge matching pipeline { $match: { $and : [ { gid: { $eq: 1 } }, {} ] } } exceeds maximum document size

which telling bson limit exceeded on retrieval.

by contrast next attempt adds $unwind , $match pipeline stages

the explain output:

  {     "$lookup": {       "from": "edge",       "as": "results",       "localfield": "_id",       "foreignfield": "gid",       "unwinding": {                        // $unwind unwinding         "preservenullandemptyarrays": false       },       "matching": {                         // $match matching         "$and": [                           // , executed against            {                                 // foreign collection             "_id": {               "$gte": 1             }           },           {             "_id": {               "$lte": 5             }           }         ]       }     }   },   // $unwind , $match stages removed   {     "$project": {       "results": {         "data": false       }     }   },   {     "$group": {       "_id": "$_id",       "results": {         "$push": "$results"       }     }   } 

and result of course succeeds, because results no longer being placed parent document bson limit cannot exceeded.

this happens result of adding $unwind only, $match added example show also added $lookup stage , overall effect "limit" results returned in effective way, since it's done in $lookup operation , no other results other matching returned.

by constructing in way can query "referenced data" exceed bson limit , if want $group results array format, once have been filtered "hidden query" being performed $lookup.


Comments

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -