mongodb - Aggregate $lookup Total size of documents in matching pipeline exceeds maximum document size -
i have pretty simple $lookup
aggregation query following:
{'$lookup': {'from': 'edge', 'localfield': 'gid', 'foreignfield': 'to', 'as': 'from'}}
when run on match enough documents following error:
command failed error 4568: 'total size of documents in edge matching { $match: { $and: [ { from: { $eq: "genedatabase:hugo" } }, {} ] } } exceeds maximum document size' on server
all attempts limit number of documents fail. allowdiskuse: true
nothing. sending cursor
in nothing. adding in $limit
aggregation fails.
how be?
then see error again. did $match
, $and
, $eq
come from? aggregation pipeline behind scenes farming out $lookup
call aggregation, 1 runs on own have no ability provide limits or use cursors with??
what going on here?
as stated earlier in comment, error occurs because when performing $lookup
default produces target "array" within parent document results of foreign collection, total size of documents selected array causes parent exceed 16mb bson limit.
the counter process $unwind
follows $lookup
pipeline stage. alters behavior of $lookup
in such instead of producing array in parent, results instead "copy" of each parent every document matched.
pretty regular usage of $unwind
, exception instead of processing "separate" pipeline stage, unwinding
action added $lookup
pipeline operation itself. ideally follow $unwind
$match
condition, creates matching
argument added $lookup
. can see in explain
output pipeline.
the topic covered (briefly) in section of aggregation pipeline optimization in core documentation:
$lookup + $unwind coalescence
new in version 3.2.
when $unwind follows $lookup, , $unwind operates on field of $lookup, optimizer can coalesce $unwind $lookup stage. avoids creating large intermediate documents.
best demonstrated listing puts server under stress creating "related" documents exceed 16mb bson limit. done briefly possible both break , work around bson limit:
const mongoclient = require('mongodb').mongoclient; const uri = 'mongodb://localhost/test'; function data(data) { console.log(json.stringify(data, undefined, 2)) } (async function() { let db; try { db = await mongoclient.connect(uri); console.log('cleaning....'); // clean data await promise.all( ["source","edge"].map(c => db.collection(c).remove() ) ); console.log('inserting...') await db.collection('edge').insertmany( array(1000).fill(1).map((e,i) => ({ _id: i+1, gid: 1 })) ); await db.collection('source').insert({ _id: 1 }) console.log('fattening up....'); await db.collection('edge').updatemany( {}, { $set: { data: "x".repeat(100000) } } ); // full pipeline. failing test uses $lookup stage let pipeline = [ { $lookup: { from: 'edge', localfield: '_id', foreignfield: 'gid', as: 'results' }}, { $unwind: '$results' }, { $match: { 'results._id': { $gte: 1, $lte: 5 } } }, { $project: { 'results.data': 0 } }, { $group: { _id: '$_id', results: { $push: '$results' } } } ]; // list , iterate each test case let tests = [ 'failing.. size exceeded...', 'working.. applied $unwind...', 'explain output...' ]; (let [idx, test] of object.entries(tests)) { console.log(test); try { let currpipe = (( +idx === 0 ) ? pipeline.slice(0,1) : pipeline), options = (( +idx === tests.length-1 ) ? { explain: true } : {}); await new promise((end,error) => { let cursor = db.collection('source').aggregate(currpipe,options); ( let [key, value] of object.entries({ error, end, data }) ) cursor.on(key,value); }); } catch(e) { console.error(e); } } } catch(e) { console.error(e); } { db.close(); } })();
after inserting initial data, listing attempt run aggregate merely consisting of $lookup
fail following error:
{ mongoerror: total size of documents in edge matching pipeline { $match: { $and : [ { gid: { $eq: 1 } }, {} ] } } exceeds maximum document size
which telling bson limit exceeded on retrieval.
by contrast next attempt adds $unwind
, $match
pipeline stages
the explain output:
{ "$lookup": { "from": "edge", "as": "results", "localfield": "_id", "foreignfield": "gid", "unwinding": { // $unwind unwinding "preservenullandemptyarrays": false }, "matching": { // $match matching "$and": [ // , executed against { // foreign collection "_id": { "$gte": 1 } }, { "_id": { "$lte": 5 } } ] } } }, // $unwind , $match stages removed { "$project": { "results": { "data": false } } }, { "$group": { "_id": "$_id", "results": { "$push": "$results" } } }
and result of course succeeds, because results no longer being placed parent document bson limit cannot exceeded.
this happens result of adding $unwind
only, $match
added example show also added $lookup
stage , overall effect "limit" results returned in effective way, since it's done in $lookup
operation , no other results other matching returned.
by constructing in way can query "referenced data" exceed bson limit , if want $group
results array format, once have been filtered "hidden query" being performed $lookup
.
Comments
Post a Comment