File type Integration process

Until now I had worked only Query type or Time based integration processes. Users who have never worked on Scribe might have never heard of these terms. So, just to give readers a brief idea, scribe supports four types of Integration Processes viz.

1. File Type: The event is triggered when there is a new file activity in the event folder.

2. Query Type: The event is triggered when the source query returns some data.

3. Time based: The event is triggered after fixed intervals set by the Console admin.

4. Queue based: This event is triggered when a new data object is placed in the Scribe input queue.

In one of our latest requirements there was a scenario where there were several production machinery which generates different log files from time to time. These logs have information like start time, chemicals used, amount of chemicals etc. These log files would be placed at some predefined folder on the network.

The Scribe packages were supposed to read from these logs and dump the data into staging tables from where other packages would take them for further processing.

Accordingly, we planned to use the ‘File’ type integration process provided by Scribe. Following are some critical aspects which need to be kept in mind.

1. Remote folder location:  The integration process requires the user to input the folder location and the type of file which needs to be read. Typically this is the location which would raise the event and initiate the integration process.

2. DTS reading Location: The DTS which has the file with a specific schema as source, need not read from the same location as the remote folder location used in the integration process. The DTS can be configured to read from a location which is independent of the remote location. This feature comes in handy when the incoming files come with a different file name every time. The integration process will raise event every time a file is placed in the folder. But the DTS will read only from the filename which was entered at the time of development.

This can be handled by copying and renaming the file to the DTS location. This can be achieved using a simple Batch file or a .Net exe. The script/exe can be run as a pre-run process in the integration process.

3. Trigger on regular intervals: It may so happen that more than one file is placed on the remote location at a time. But the DTS is capable to read only a single file at a time. To handle such issues, Scribe integration process has the facility to trigger by itself after a given interval of time. Using this feature and some custom queuing logic the problem of multiple files can be resolved.

Scribe Insight: Multiple sleeping processes on SQL Server.

Yesterday we were facing a severe performance issue on our SQL Server. We have a table in which one of our scribe DTS package pushes data before inserting it into MS-CRM. For some unknown reason the data was not getting inserted (Investigation on this issue would be posted in a different blog shortly). While investigating the issue, we were required to delete the data in the table so as to start with new set of data.

But somehow the data was not getting deleted. We even tried Truncating the table, but were unable to get through. However the system was not giving any error. This made us think that there is some sort of Deadlock being created on the table. Some process has lock on the data and is not releasing it.

On using the “sp_who2” command on the SQL editor, we found there are more than 200 processes being created by scribe. We thought that these processes were the reason why the performance was getting hampered. So many processes trying to access the same resource is bound create performance issue. We were also concerned whether the number of connections would go on increasing every time the scheduled package is run. Since these processes would keep on consuming precious memory and eventually exhausting all the resources and leading to memory leaks.

On further investigation we found out that scribe creates these processes to make sure memory is cached and is available to respond quickly to an actual process request. Most of these processes were in sleeping mode and were waiting for commands. One more interesting observation was that all of these sleeping processes were pointing to the ScribeInternal database. There were few other processes which were pointing to our custom databases and few of them were suspended, because of the deadlock issue. The system started working again after manually killing these processes.

The scribe processes only awakes when there is certain task to be done and hence do not consume any additional resources. These processes are being monitored by the Scribe Monitor Service.