Corrupt xlsx2csv
-
Version
1.0
Excel 2007 files are really zipped collections of
mostly XML files. XML is not tolerant of file
corruption and from the errors generated it appears
that Excel 2007 is using a fairly corrupt intolerant
XML reading algorithm to even salvage unformatted data
from corrupt xlsx format files.
Corrupt XLSX2CSV uses an unzipper which is tolerant of
XML file corruption and uses Perl coding to extract
the sharedStrings.xml and worksheet.xml files where
all of the unformatted data resides in a xlsx file.
Since this Perl coding does not use a standard XML
reading applet or module but identifies the cell data
as text or string and extracts, the result is more
less perfectly extracted data until that part of the
xml files where the corruption starts, is reached.
Contrastingly, Excel 2007 appears to return no results
for that particular xml file if it encounters any
errors at all.
The program has a Perl/Tk GUI front end. It can also
|