Using * in OPENROWSET to use Data Lake metadata

SynapseData Lake

Jul 5

Use the values of wildcards in your SQL query—Very powerful way to convert your file path or filename metadata into actual data!

Let’s say you stored your files like this:
yellow_tripdata_<year>-<month>.csv

For example, yellow_tripdata_ 𝟮𝟬𝟮𝟮-𝟬𝟭.csv

Let’s fetch those year and month data and turn them into SQL columns, with two simple steps (and a bonus tip)

𝟭) 𝗔𝗱𝗱 𝘄𝗶𝗹𝗱𝗰𝗮𝗿𝗱𝘀 𝗶𝗻 𝗢𝗣𝗘𝗡𝗥𝗢𝗪𝗦𝗘𝗧()

By replacing those dynamic values with wildcards we can ensure we read all the data, but also subsequently that we can use the wildcards as data.

Code example:

FROM OPENROWSET(
BULK 'csv/taxi/yellow_tripdata_*-*.csv'

𝟮) 𝗨𝘀𝗲 𝘄𝗶𝗹𝗱𝗰𝗮𝗿𝗱𝘀 𝗶𝗻 𝗦𝗘𝗟𝗘𝗖𝗧 𝘄𝗶𝘁𝗵 𝗳𝗶𝗹𝗲𝗽𝗮𝘁𝗵()

The function filepath() allows you to return the entire filepath of your files, but if you add a number inside the () you can refer to any wildcard from step 1.

Filepath(1) will represent the value of the first wildcard, (2) the second, and so on.

Code example:

SELECT
r.filepath() AS filepath
,r.filepath(1) AS [year]
,r.filepath(2) AS [month]

𝟯) 𝗕𝗼𝗻𝘂𝘀 𝘁𝗶𝗽 - 𝗪𝗼𝗿𝗸𝘀 𝗶𝗻 𝗪𝗛𝗘𝗥𝗘 𝘀𝘁𝗮𝘁𝗲𝗺𝗲𝗻𝘁

Using the wildcard references is not limited to SELECT, but also works in WHERE statements. This means you can use the metadata of your filepath or file name to filter your query.
Quite powerful!

Code example:

WHERE
r.filepath(1) IN ('2017')
AND r.filepath(2) IN ('10', '11', '12')

It even works if you use the alias instead, even though the value has been casted.
And that’s it! A little neat trick with lots of utility in the right hands.

Azure Data LakeSynapse Serverless

Mathias Halkjaer

Using * in OPENROWSET to use Data Lake metadata

Measure/Column dependencies in Power BI

Power BI Data Marts

FLUXBI