Importing Flat Files with embedded text qualifiers

Importing Flat Files with Embedded Text Qualifiers to SQL Server

Some systems (especially Linux) can generate flat files where a text qualifier is escaped within quoted text. Think about a list of plumbing parts where you have the SKU number and description. Some of these lines could look like the following in a database:

SKU Number Description
48592521 4" Pipe, Reducer
845925 2 1/4" Coupling

 

When you need to export this to a flat file such as a csv, the system usually encapsulates text in a text qualifier, which is typically a quote mark (“). In the situation above, the data would get written out to a file as follows:

SKU Number,SKU Description
48592521,"4\" Pipe, Reducer"
845925,"2 1/4\" Coupling"

In this situation, you can see that the embedded text qualifier is escaped with a backslash (\). However, most ETL tools cannot handle the escaped text qualifier and would instead assume that the escaped text qualifier was the end of the field, which would result in an error importing the file to another database.

How do we handle an embedded text qualifier?

To get around this though with EDIS, we will leverage the procedure usp_replace_file_content. This procedure will replace a string in a file with another string. What you can do prior to importing the data is to replace the escaped text qualifier with some other marker that can be processed later with an UPDATE statement in SQL Server after the data is imported. In our example above, we will use usp_replace_file_content to replace the \” with a new marker called {EMBEDDED_TEXT_QUAL}. Here is the procedure example to replace the content.


EXEC SSISDB.edis.usp_replace_file_content @file_path = 'C:\temp\plumbing_parts.csv'
,@orig_string = '\"'
,@replace_string = '{EMBEDDED_TEXT_QUAL}'
,@backup_file_nm = 'plumbing_parts_orig.csv'

 

Notice that we also specified a backup file, so that we can preserve the original file content in case something goes wrong. Once the replacment proc has ran, the file now looks like this:

SKU Number,SKU Description
48592521,"4{EMBEDDED_TEXT_QUAL} Pipe, Reducer"
845925,"2 1/4{EMBEDDED_TEXT_QUAL} Coupling"

Now that the embedded quotation has been replaced, we can import the data simly by running usp_run_data_transfer as shown below.


EXEC SSISDB.EDIS.usp_run_data_transfer
 @src_sys = 'FLATFILE'
,@dest_sys = @@servername
,@dest_tbl = '##raw_data'
,@crt_dest_tbl = 1
,@file_path = 'C:\Temp\plumbing_parts.csv'
,@col_delim = ','
,@text_qual = '"'

 

Now that the data has been imported into the SQL Server, we have one more step to do – We have to update our text qualifier marker with the ” symbol so that it resembles the original description from the source. That is done with this simple T-SQL update statement:

UPDATE ##raw_data SET [SKU Description] = REPLACE([SKU Description], '{EMBEDDED_TEXT_QUAL}', '"')
 

That’s it!
Below are links to the example CSV file as well as the full T-SQL to import the data with EDIS. Enjoy and remember that all features that were demonstrated in this blog are FREE in EDIS Standard edition. Download EDIS Standard today and give it a try!

Plumbing Parts CSV File
usp_import_flat_file_with_escape_text

Leave a Reply

Your email address will not be published. Required fields are marked *