Monday, February 20, 2012

Missing files after using File System Task

Hi All,

I don't know if anyone faced this issue. We are having a strange problem. Our process was working well when it was implemented on 32 bit processor.IT ran perfectly for 6 months with out a problem. But when we moved the packages to a 64 bit machine, this issue along with some other issues started to show up.

The issue is we are missing files in the source folder.

Our process is designed such that a source process, brings in a file and updates a status for the file in a audit table. The ETL process picks up the file, then assigns the status as ‘running’ when SRC process is complete and loads into Target DB, and updates ETL status to complete. But current problem is the ETL is losing files after it assigns the status as running. When we looked into the DB weather the data is loaded, we could not find any data related to these files.

we are have mapping level parameters for source path and target path.

We are using a For Each Loop task, and processing files(which are simple flat files) in the source path. The file name is stored in the mapping level parameter. Once the file is process we are moving them into a target path.

Our src and target file paths are on the same drive, just have src folder, inside src folder we have processed folder and failed folder. So files are picked from the source folder and moved into processed folder after processing. The files are not even moved to a failed folder.

There are lot other processing going on this box, and the trend observed is that when more processors are running at peak hour, the missing files’ count is more.

Right now we are refetching those files, as a work around, but does any one has any suggestion why this is happening or any better implementation suggestions?

Thanks

We solved our problem. This was due to the combination of problems.

1. Environment setup for our new 64 bit server

2. Use of event handler for 'On Error' instead of 'On task failure'.

3. Overwrite destination in move file task properties in Event handler.

4. Consistent connection (I don't exactly remember the word, but its in the properties of the connections in connection manager)

1st and 4th issue: The sympotom was the missing files were more when the system was too busy, also processing other packages. When we set up our initial 32 bit machine, we had our named pipes enabled on both client and server. so no issues. But in our new 64 bit environment, the server named pipes was disabled (Shared Memory, TCP/IP were enabled) and client's was enabled, so when client wants a connection, it looks in the same order, 1st Shared Memory, 2nd TCP/IP and last named pipes. But the server, only gives connection on first 2. So the connection were timing out in peak processing.

Added to that the connection properties in the package(4th issue), was not stable state, so every time, its encounters a task in the package which needs DB connection, its requests new connection. So this issue multiplied because the named pipes were disabled. So the solution to this both issues is, just enable named pipes on the server (or disable on client, enable is better since the process has extra options for connections). In the package conection properties, select connection as stable state. This option helps to have stable connection from the start of package execution to the end, instead of connection and disconnecting for each task.

2nd & 3rd issue: This issue was already present in our 32 bit also, but did not show up since we did not have connection failure error due to named pipes. Since connection was timing out on our new environment, each failure consitues to an error, so when the first error raised, the src file was moved to the destination folder(with overwrite destination option true). But due to connection failures, we had often multiple error events, so for subsequest error events, its tries to moved again the src file which is not present since it was moved the first time. So for subsequest errors, since we had overwrite destination option to true, the package first deletes the target files, and tries to move the file and fails, since the src file is not present. This leads to the missing file problem.

The solution to this problem is, just set overwrite destination option to false in file task. Based on your requirement, you can also have the event handler on the 'On Task Failure' event instead of 'on Error event'.

Hope this helps a lot of pains to lot of people, and saves lot of bucks to many companies.

Excuse my spelling mistakes if any.

Thanks,

Venkat

No comments:

Post a Comment