sql - Fetch big tree without overhead in left tables -

February 15, 2012

my question more theoretical , why rdbms/drivers return data way it, not how find correct set, nor how find it. i'm pretty familiar sql, there 1 thing annoyed sense of economy.

consider following "class" graph:

a {     field1, ..., field9     b_items = [ b1, ..., bn ] }  b {     field1, ..., field6     c_items = [ c1, ..., cm ] }      c {     field1, field2 }

we have few objects, each object has many b objects, , each b objects has lots of c objects. count(a) < count(b) << count(c).

now use rdbms store it, because relations cool , optimizers smart, can virtually in milliseconds, provided there plan , index set.

i'll skip table creation code, should obvious, , go straight select:

select * left join b on b.a_id = a.id left join c on c.b_id = b.id whatever

database server returns result set combined of columns tables, joined sort-of tree:

a.f1 .... a.f9  b.f1 .... b.f6  c.f1 c.f2 ---------------------------------------------------    1    1    1     1    1    1     1    1    1    1    1     1    1    1     2    2    1    1    1     1    1    1     3    3    ... more rows...    1    1    1     1    1    1   999  999                         ↓    1    1    1     2    2    2     1    1    1    1    1     2    2    2     2    2    ... more rows...    1    1    1     2    2    2   999  999    ... lots of rows ...    1    1    1    99   99   99   999  999         ↓    2    2    2 -- oh there is, a[2]    ...    5    5    5  null null null  null null -- a[5] has no b_items    ...    9    9    9 ...

the problem is, if has many columns, text, json, other heavy data, duplicated thousands of times match each product of +b+c join. why don't sql servers @ least not send me the same {a,b}-rows after first 1 in join group? ideally, see result:

[   {     <a-fields>,     b = [       {         <b-fields>,         c = [           {             <c-fields>           },           ... more c rows         ]       },       ... more b rows     ]   },   ... more rows ]

which pretty resembles need in memory on client-side. know can make more queries fetch less data, e.g. via a.id in (ids...) or stored proc returning nulls on parasite rows, isn't relational model intended one-shot access? roundtrips heavy, , planner guesses. , real data graphs of 3 steps height (consider 5-10). why not make via single pass, without excessive traffic?

i'm fine duplicate cells in , b columns, because there not much, maybe i'm missing mainstream, sql , non-hacky google hides me many years.

thanks!

the way avoid duplicated data transfer use aggregate functions string_agg () or array_agg (). can aggregate data using jsonb functions. can single json object instead of tabular data, example:

select jsonb_agg(taba) (     select to_jsonb(taba) || jsonb_build_object('tabb', jsonb_agg(tabb)) taba     taba     left join (         select to_jsonb(tabb) || jsonb_build_object('tabc', jsonb_agg(to_jsonb(tabc))) tabb         tabb         join tabc on tabc.bid = tabb.id         group tabb.id         ) tabb      on (tabb->>'aid')::int = taba.id     group taba.id ) taba

complete working example.

Search This Blog

Force Net

sql - Fetch big tree without overhead in left tables -

Comments

Post a Comment

Popular posts from this blog

ubuntu - PHP script to find files of certain extensions in a directory, returns populated array when run in browser, but empty array when run from terminal -

php - How can i create a user dashboard -

javascript - How to detect toggling of the fullscreen-toolbar in jQuery Mobile? -