Importing Flat Files with embedded text qualifiers

Importing Flat Files with Embedded Text Qualifiers to SQL Server

Some systems (especially Linux) can generate flat files where a text qualifier is escaped within quoted text. Think about a list of plumbing parts where you have the SKU number and description. Some of these lines could look like the following in a database:

SKU Number Description
48592521 4" Pipe, Reducer
845925 2 1/4" Coupling

 

When you need to export this to a flat file such as a csv, the system usually encapsulates text in a text qualifier, which is typically a quote mark (“). In the situation above, the data would get written out to a file as follows:

SKU Number,SKU Description
48592521,"4\" Pipe, Reducer"
845925,"2 1/4\" Coupling"

In this situation, you can see that the embedded text qualifier is escaped with a backslash (\). However, most ETL tools cannot handle the escaped text qualifier and would instead assume that the escaped text qualifier was the end of the field, which would result in an error importing the file to another database.

How do we handle an embedded text qualifier?

To get around this though with EDIS, we will leverage the procedure usp_replace_file_content. This procedure will replace a string in a file with another string. What you can do prior to importing the data is to replace the escaped text qualifier with some other marker that can be processed later with an UPDATE statement in SQL Server after the data is imported. In our example above, we will use usp_replace_file_content to replace the \” with a new marker called {EMBEDDED_TEXT_QUAL}. Here is the procedure example to replace the content.


EXEC SSISDB.edis.usp_replace_file_content @file_path = 'C:\temp\plumbing_parts.csv'
,@orig_string = '\"'
,@replace_string = '{EMBEDDED_TEXT_QUAL}'
,@backup_file_nm = 'plumbing_parts_orig.csv'

 

Notice that we also specified a backup file, so that we can preserve the original file content in case something goes wrong. Once the replacment proc has ran, the file now looks like this:

SKU Number,SKU Description
48592521,"4{EMBEDDED_TEXT_QUAL} Pipe, Reducer"
845925,"2 1/4{EMBEDDED_TEXT_QUAL} Coupling"

Now that the embedded quotation has been replaced, we can import the data simly by running usp_run_data_transfer as shown below.


EXEC SSISDB.EDIS.usp_run_data_transfer
 @src_sys = 'FLATFILE'
,@dest_sys = @@servername
,@dest_tbl = '##raw_data'
,@crt_dest_tbl = 1
,@file_path = 'C:\Temp\plumbing_parts.csv'
,@col_delim = ','
,@text_qual = '"'

 

Now that the data has been imported into the SQL Server, we have one more step to do – We have to update our text qualifier marker with the ” symbol so that it resembles the original description from the source. That is done with this simple T-SQL update statement:

UPDATE ##raw_data SET [SKU Description] = REPLACE([SKU Description], '{EMBEDDED_TEXT_QUAL}', '"')
 

That’s it!
Below are links to the example CSV file as well as the full T-SQL to import the data with EDIS. Enjoy and remember that all features that were demonstrated in this blog are FREE in EDIS Standard edition. Download EDIS Standard today and give it a try!

Plumbing Parts CSV File
usp_import_flat_file_with_escape_text

Processing Complex Text Files with SQL Server

Many organizations work with files that contain header information at the top of the file before the data rows begin. This header information typically contains critical information of the file such as when it was processed, any transaction identity markers or batch ID’s, and other important information. In order to retrieve this information, IT departments usually have to write custom scripts. This makes it challenging to manage as the files can change over time. With EDIS though, processing files like this are very simple and quick to change when need be.

Consider an example file such as “bank_file.txt”. This file contains 4 key pieces of information at the top of the file before the actual data rows begin which are the run date, run time, batch ID, and wave ID. Here is an example of what the file looks like when opened:

Run Date: 2017-04-12
Run Time: 08:07AM

Batch ID: x4372

Wave ID: 411928257282

<< BEGIN DATA LOAD >>

TransactionID ProductID ReferenceOrderID ReferenceOrderLineID TransactionDate TransactionType Quantity ActualCost ModifiedDate
100000 784 41590 0 2013-07-31 00:00:00 W 2 0 2013-07-31 00:00:00
100001 794 41591 0 2013-07-31 00:00:00 W 1 0 2013-07-31 00:00:00

As you can see in this file, the first 11 lines of the file are filled with the information we need and whitespace before the actual data rows begin. To retrieve the 4 pieces of information we need, we will use EDIS function ufn_read_file_line. This function will read a specific line from a text file, simple as that. To get the run date, we can use the function as follows:


DECLARE @file_path nvarchar(1000) = 'C:\bin\stage files\bank_file.txt';
-- The run date is on line 2 of the file
DECLARE @run_dt_line nvarchar(max) = SSISDB.edis.ufn_read_file_line(@file_path,2)
-- now remove the text "Run Date: " and cast the result as a date
DECLARE @run_dt date = CAST(REPLACE(@run_dt_line,'Run Date: ','') as DATE);
PRINT @run_dt
2017-04-12

 

Simple right? Imagine if one day when you receive the file, the line on which the run date is printed changes to line 4 instead of line 2? To fix this, simply change the second argument in EDIS.ufn_read_file_line from 2 to 4, and that’s it.

Below are links to the example text file and a stored procedure to process it in full. Enjoy and remember that all features that were demonstrated in this blog are FREE in EDIS Standard edition. Download EDIS Standard today and give it a try!

Bank File
usp_import_flat_file_with_header_info