This tool originated when a team of us contemplated a data conversion project. While the final database was under 10Gb, the legacy system consisted of over 30 individual systems running one of two bespoke mainframe applications or a SQL*Server-based practice management system. The new incarnation integrated an Oracle-based legal practice management system with Oracle Financials and more.
Data conversions can be a big deal. Even from a simple engineering point of view, there is a lot to worry about. There is all the data to be marshalled. Database extracts, proprietary binary dumps, spreadsheets, comma-separated-values in flat files,... Then there can be thousands of scripts. These will come in a variety of flavours: data manipulation obviously, data definition, stored procedures, reports, export and import utilities to run, utilities to transform spreadsheets into CSV files,... Just executing them isn't enough, however. We wanted to automate as much as possible. Having someone sitting executing scripts by hand is slow, wasteful and error-prone. The tool had to automatically execute these scripts with no extra input from the user. This gives us the first requirements for the tool we wanted:
Scripts fail. They fail spectacularly; they fail quietly; they fail unpredictably in data conversions because the data quality is uneven. During a conversion, any failure should cause the run to stop before errors are propagated. Once erroneously converted data starts being re-used, the only sensible way to recover is give up and start again from a full database backup. That is too time-consuming to contemplate doing frequently. To fail immediately and be able to
The data often needs to be loaded in a particular order, so the order of script execution may be important. In fact, some will have very specific preconditions before they can be run. Not all the scripts will be under the control of the data conversion team; they may have been contributed by a vendor, by the DBA or by other project teams. These cannot even be optimized or re-written the way you'd like.
Data conversions are naturally long running jobs. Lots of data to process. Individual legacy rows may even be processed multiple times for different reasons. One immediate implication is that data converson should not be run from some developer's desktop. The risk of interruption by a corporate re-boot policy or a idle time-out is unacceplable. Our tool, ScriptRunner, should be able to run without an interactive session.
Notwithstanding the earlier comment about desktop PCs, in practice, that is where development occurs. Since our developer PCs ran Windows and the database machines were Solaris, any tool had to be
To help optimize the conversion process, it would be nice to be
Financial systems are periodically audited. However, different applications will store the same number in different ways, e.g. a debt could be positive or negative. An auditor may well insist on knowing how the old number was transformed into the new number. The auditor, coming in months after the conversion is signed-off, may ask to see not only relevant scripts but also some proof that they were the scripts that ran. Thus, we need a clear and persistant audit trail of scripts executed.