azure data lake - U-SQL Paralell reading from SQL Table -
i have scenario in ingesting data ms sql db azure data lake using u-sql. table quite big, on 16 millions records (soon more). select a, b, c dbo.mytable;
i realized, however, 1 vertex used read table.
my question is, there way leverage parallelism while reading sql table?
i don't believe parallelism external data sources supported yet u-sql (although happy corrected). if feel important missing feature can create request , vote here:
https://feedback.azure.com/forums/327234-data-lake
as workaround, manually parallelise queries, depending on columns available in datasource. eg date
// external query working use database youradladb; // create external query year 2016 @results2016 = select * external yoursqldbdatasource execute @"select * dbo.yourbigtable (nolock) yourdatecol between '1 jan 2016 , 31 dec 2016'"; // create external query year 2017 @results2017 = select * external yoursqldbdatasource execute @"select * dbo.yourbigtable (nolock) yourdatecol between '1 jan 2017 , 31 dec 2017"; // output 2016 results output @results2016 "/output/bigtable/results2016.csv" using outputters.csv(); // output 2017 results output @results2017 "/output/bigtable/results2017.csv" using outputters.csv();
now, have created different issue breaking files multiple parts. read these using filesets parallelise, eg:
@input = extract ... // column list "/output/bigtable/results{year}.csv" using extractors.csv();
i ask why choosing move such large file lake given adla , u-sql offer ability query data lives. can explain further?
Comments
Post a Comment