Importing Flat Files with embedded text qualifiers

Importing Flat Files with Embedded Text Qualifiers to SQL Server

Some systems (especially Linux) can generate flat files where a text qualifier is escaped within quoted text. Think about a list of plumbing parts where you have the SKU number and description. Some of these lines could look like the following in a database:

SKU Number Description
48592521 4" Pipe, Reducer
845925 2 1/4" Coupling

 

When you need to export this to a flat file such as a csv, the system usually encapsulates text in a text qualifier, which is typically a quote mark (“). In the situation above, the data would get written out to a file as follows:

SKU Number,SKU Description
48592521,"4\" Pipe, Reducer"
845925,"2 1/4\" Coupling"

In this situation, you can see that the embedded text qualifier is escaped with a backslash (\). However, most ETL tools cannot handle the escaped text qualifier and would instead assume that the escaped text qualifier was the end of the field, which would result in an error importing the file to another database.

How do we handle an embedded text qualifier?

To get around this though with EDIS, we will leverage the procedure usp_replace_file_content. This procedure will replace a string in a file with another string. What you can do prior to importing the data is to replace the escaped text qualifier with some other marker that can be processed later with an UPDATE statement in SQL Server after the data is imported. In our example above, we will use usp_replace_file_content to replace the \” with a new marker called {EMBEDDED_TEXT_QUAL}. Here is the procedure example to replace the content.


EXEC SSISDB.edis.usp_replace_file_content @file_path = 'C:\temp\plumbing_parts.csv'
,@orig_string = '\"'
,@replace_string = '{EMBEDDED_TEXT_QUAL}'
,@backup_file_nm = 'plumbing_parts_orig.csv'

 

Notice that we also specified a backup file, so that we can preserve the original file content in case something goes wrong. Once the replacment proc has ran, the file now looks like this:

SKU Number,SKU Description
48592521,"4{EMBEDDED_TEXT_QUAL} Pipe, Reducer"
845925,"2 1/4{EMBEDDED_TEXT_QUAL} Coupling"

Now that the embedded quotation has been replaced, we can import the data simly by running usp_run_data_transfer as shown below.


EXEC SSISDB.EDIS.usp_run_data_transfer
 @src_sys = 'FLATFILE'
,@dest_sys = @@servername
,@dest_tbl = '##raw_data'
,@crt_dest_tbl = 1
,@file_path = 'C:\Temp\plumbing_parts.csv'
,@col_delim = ','
,@text_qual = '"'

 

Now that the data has been imported into the SQL Server, we have one more step to do – We have to update our text qualifier marker with the ” symbol so that it resembles the original description from the source. That is done with this simple T-SQL update statement:

UPDATE ##raw_data SET [SKU Description] = REPLACE([SKU Description], '{EMBEDDED_TEXT_QUAL}', '"')
 

That’s it!
Below are links to the example CSV file as well as the full T-SQL to import the data with EDIS. Enjoy and remember that all features that were demonstrated in this blog are FREE in EDIS Standard edition. Download EDIS Standard today and give it a try!

Plumbing Parts CSV File
usp_import_flat_file_with_escape_text

Processing Complex Text Files with SQL Server

Many organizations work with files that contain header information at the top of the file before the data rows begin. This header information typically contains critical information of the file such as when it was processed, any transaction identity markers or batch ID’s, and other important information. In order to retrieve this information, IT departments usually have to write custom scripts. This makes it challenging to manage as the files can change over time. With EDIS though, processing files like this are very simple and quick to change when need be.

Consider an example file such as “bank_file.txt”. This file contains 4 key pieces of information at the top of the file before the actual data rows begin which are the run date, run time, batch ID, and wave ID. Here is an example of what the file looks like when opened:

Run Date: 2017-04-12
Run Time: 08:07AM

Batch ID: x4372

Wave ID: 411928257282

<< BEGIN DATA LOAD >>

TransactionID ProductID ReferenceOrderID ReferenceOrderLineID TransactionDate TransactionType Quantity ActualCost ModifiedDate
100000 784 41590 0 2013-07-31 00:00:00 W 2 0 2013-07-31 00:00:00
100001 794 41591 0 2013-07-31 00:00:00 W 1 0 2013-07-31 00:00:00

As you can see in this file, the first 11 lines of the file are filled with the information we need and whitespace before the actual data rows begin. To retrieve the 4 pieces of information we need, we will use EDIS function ufn_read_file_line. This function will read a specific line from a text file, simple as that. To get the run date, we can use the function as follows:


DECLARE @file_path nvarchar(1000) = 'C:\bin\stage files\bank_file.txt';
-- The run date is on line 2 of the file
DECLARE @run_dt_line nvarchar(max) = SSISDB.edis.ufn_read_file_line(@file_path,2)
-- now remove the text "Run Date: " and cast the result as a date
DECLARE @run_dt date = CAST(REPLACE(@run_dt_line,'Run Date: ','') as DATE);
PRINT @run_dt
2017-04-12

 

Simple right? Imagine if one day when you receive the file, the line on which the run date is printed changes to line 4 instead of line 2? To fix this, simply change the second argument in EDIS.ufn_read_file_line from 2 to 4, and that’s it.

Below are links to the example text file and a stored procedure to process it in full. Enjoy and remember that all features that were demonstrated in this blog are FREE in EDIS Standard edition. Download EDIS Standard today and give it a try!

Bank File
usp_import_flat_file_with_header_info

 

 

 

Sending a Text Message from SQL Server

Sending a Text Message from SQL Server

Sending a text message from SQL Server is actually pretty easy. I had no idea, but text messages go just like a standard email does. As long as you know your service provider’s smtp host, you can use the MSDB..sp_send_dbmail proc and put the recipient as their phone number @smtp_provider.com i.e. “8888675309@vtext.com”

Here is the link to the code. It currently works for the US big 4 (Verizon/ATT/TMoble/Sprint). Note, if you do not have database mail enabled, you will first need to do that. Here is the BOL link for database mail: https://docs.microsoft.com/en-us/sql/relational-databases/database-mail/configure-database-mail

SQL Server Text Message Code – Enjoy!

usp_send_text_msg

Creating a Robust Database for Business Users Without Blocking

Over the years of building and maintaining business user facing databases for adHoc and reporting analysis, I’ve had numerous situations where I would need to truncate/reload a table or perform an update on it. However, if one of the business users was querying the table, I would have to wait for them to finish their data pull before I could do the update. This would lead to a lot of cat and mouse games and trying to contact the user if the query ran too long to suspend it so I could make changes. To overcome this, I came up with this strategy that has been working great. The proc to maintain this is attached, and any suggestions are welcome:

Strategy to Maintain a Robust SQL Database and avoid blocking

  1. Run a script hourly that takes each table and creates a corresponding view with the same columns and uses the NOLOCK hint
        • The view name is the same as the table but prefixed with a “v_”.
        • The view has a watermark applied at the top of the definition in the comments so that we can detect later if the view was created from this script in case it needs to be dropped.
        • Example: You have a table with the following structure:

       

    CREATE TABLE dbo.dim_departments
    (
    dpt_id int primary key
    ,last_upd_ts smalldatetime not null
    ,dept_nm varchar(50)
    )
    The corresponding view that the script creates will look like this:
    /*
    *** AUTO VIEW GENERATION PROCESS ***
    VIEW WAS AUTO GENERATED FROM proc usp_update_bus_user_permissions on 2017-07-05 12:00:16.2866
    DO NOT ALTER THIS VIEW. IT WILL BE OVERWRITTEN HOURLY.
    IF YOU NEED TO ADD COLUMNS, CREATE ANOTHER VIEW prefixed with "vt_"
    */
    CREATE VIEW dbo.v_dim_departments as
    SELECT
    [dpt_id]
    ,[last_upd_ts]
    ,[dept_nm]
    FROM dbo.dim_departments WITH (NOLOCK)

  2. The script then grants SELECT and VIEW DEFINITION to the business user account on the view, but not the base table
  3. After all views have been created/updated, the script checks for views from the auto-generation process (prefixed with “v_” and the watermark) that are unresolvable due to syntax errors or the original base table getting dropped. When these views are detected, they get dropped from the database. This helps maintain a database that is self-cleaning and provides users with views that will fire on SELECT.

A few remarks

  1. When I create databases, I always use a database role to assign permissions to. I don’t ever assign directly to an LDAP or SQL login incase the user leaves or no longer needs access. By maintaining permission on the database role, it makes the maintenance a lot easier
  2. The script watermark suggests that if you need to create a custom view, to prefix it with “vt_” so that it does not get picked up in this auto-generation process. You can still name a view with a “v_” and as long as the watermark doesn’t appear in the definition, it won’t get overwritten. However, I’ve found it a lot easier to request that developers just use the “vt_” prefix for custom views so we can identify quickly if a view was auto-generated or custom.

Thanks for reading and any feedback is greatly appreciated.

CODE:

usp_update_bus_users_permissions