Legacy data

Legacy data is an information that is stored in old format or on obsolete information carrier that is difficult to access or process.

Problems with accessing legacy data can be associated with several causes:

no hardware to read data stored using obsolete information carrier (e.g. floppy disk, streamer),
no software to open file or convert it to readable format.

A lot of information collected by early information systems were kept on streamer tapes or floppy disks. Nowadays it is difficult to find old and working floppy or streamer drives. Therefore, accessing data is difficult. Many companies scraped their old computers, as they seemed to be unnecessary. However, sometimes it is essential to find old legacy data, e.g. from human resources systems or some government, insurance or health systems. There are service companies that help to recover such a legacy data.

In many households there are old video tapes (VHS or other). The best solution is to find working video tape player, connect it to computer and convert videos into files. Special video card usually is required. Many photo services will do it for you quite cheap.

It is easier to restore legacy data from files that are kept in old formats. Some of old applications are available on the internet as abandonware. However, some modern computers and system can be unable to run those applications. In that case emulators can help (e.g. DOSemu). They emulate old environment on modern computer. Other way is to find converter. For many once popular formats (e.g. ChiWriter) there were converters created. They convert files into readable formats^[1].

Legacy data and organisations

Many organisations have invested in the special systems that manage data and allow access to this data because they use a large amount of data. These systems allow separation, merging and grouping legacy data. In this system, legancy data does not have to be moved elsewhere, because middleware system makes available an integrated outlook of legacy data sources^[2]. Wrappers are components of software which converts data from LDM (legacy data model) to WDM (wrapper data model^[3]. They are crucial elements of middleware system and data sources are escapulated by wrappers. Legacy data is modeled as subjects by wrappers. These wrappers ensure standard interfaces for query accomplishment and method invocation^[4].

Legacy data systems

Legacy data systems may be based on different data models and this makes a problem in wrapping these systems. Wrapper must disguise the data model implemented by legacy system so that legacy system can deal with the differences between these models. Wrapper creates a canonical model that is more generic. The depiction of the database offered by wrapper must be a sematically richer than that gived by the DDL instruction. In addition, various data models have disparate languages that are used to fudge this data, so queries or commands are translated by wrapper from one language to another. Unfortunately, such translatation is not always realisable. Sometimes he must feign operations and conduct because the canonical model requires it. For COBOL files wrapper must feign the connected update and delete modes that inform how to supervise and propagate these operaions, if this canonical model contains interobject relationships or foreign keys. For COBOL files wrapper must feign some vanilla forms of alternative predicate if this model offers language which is similary to SQL language^[5].

Wrapper that is destined for legacy data sources has three dimension.Ph. Thiran and J.-L. Hainaut mention about:

"the model-wrapper
the instance-wrapper
the upper-wrapper"^[6]

The first and second dimension are created automatically, and the third dimension is created manually^[7].

Goals of wrappers architecture

Roth M.T. and Schwarz P. argue that the architecture of building wrappers achieves some aims that allow the use of this architecture to integrate a different set of data sources. These aims are:

"The start-up cost to write a wrapper should be small
Wrappers should be able to evolve
The architecture should be flexible and allow for graceful growth
The architecture should readily lend itself to query optimization"^[8]

References

Rodriguez, J. B., & Gómez-Pérez, A. (2006, May). Upgrading relational legacy data to the semantic web. In Proceedings of the 15th international conference on World Wide Web (pp. 1069-1070). ACM.
Roth M.T., Schwarz P.M (1997).Don’t Scrap It, Wrap It! A Wrapper Architecture for Legacy Data Sources.VLDB.
Thiran P., Hainaut J.L., (2001).Wrapper Development for Legacy Data Reuse. Proceedings Eighth Working Conference on Reverse Engineering.

Footnotes

↑ Rodriguez J.B., Gómez-Pérez A. (2006)
↑ Roth M.T., Schwarz P.M (1997)
↑ Thiran P., Hainaut J.L., (2001)
↑ Roth M.T., Schwarz P.M (1997)
↑ Thiran P., Hainaut J.L., (2001)
↑ Thiran P., Hainaut J.L., (2001)
↑ Thiran P., Hainaut J.L., (2001)
↑ Roth M.T., Schwarz P.M (1997)

Author: Joanna Zawiślan