Plan 9 from Bell Labs’s /usr/web/sources/patch/applied/xls2txt/doc2txt

Copyright © 2021 Plan 9 Foundation.
Distributed under the MIT License.
Download the Plan 9 distribution.


.TH DOC2TXT 1
.SH NAME
doc2txt, xls2txt olefs, mswordstrings msexceltable \- extract printable strings from Microsoft Office documents
.SH SYNOPSIS
.B doc2txt
[
.I file.doc
]
.br
.B xls2txt
[
.I file.xls
]
.br
.B aux/olefs
[
.B -m
.I mtpt
]
.I file.doc
.br
.B aux/mswordstrings 
.I /mnt/doc/WordDocument
.br
.B aux/msexceltable
[
.B -n
] [
.B -t
] [
.B -a
] [
.BI -d delim
]
.I /mnt/doc/Workbook
.SH DESCRIPTION
.I Doc2txt
is a shell script that uses 
.I olefs
and
.I mswordstrings
to extract the printable text from the body of a Microsoft Word document.
.I Xls2txt
performs a similar function for Microsoft Excel documents.
.PP
Microsoft Office documents are stored in OLE (Object Linking and Embedding)
format, which is a scaled down version of Microsoft's FAT file system.
.I Olefs
presents the contents of an Office document as a file system
on
.IR mtpt ,
which defaults to
.BR /mnt/doc .
.I Mswordstrings
or
.I msexceltables
may then be used to parse the files inside, extracting
a text stream.
.I Msexceltables
may be given options to control the formatting of its output.
.TP
-n
Disables field padding to colum width.
.TP
-t
Truncate fields to the colum width.
.TP
-a
Attempt conversion of non-tabular sheets in the workbook. (charts).
.TP
-d \fIdelim\fR
Sets the interfield delimiter to the string \fIdelim\fR, by default a single space.
.SH SOURCE
.B /sys/src/cmd/aux/mswordstrings.c
.br
.B /sys/src/cmd/aux/msexceltables.c
.br
.B /sys/src/cmd/aux/olefs.c
.br
.B /rc/bin/xls2txt
.br
.B /rc/bin/doc2txt
.SH BUGS
.I Msexcelstrings
cannot parse files containing rich text field descriptions or Asian phonetic
pronunciation hints due to a lack of ducumentation on these formats; It has
only been tested on BIFF8 files generated by MS Office 97; Caveat Emptor.
.SH SEE ALSO
.IR strings (1)
.br
``Microsoft Word 97 Binary File Format'',
available on line at Microsoft's developer home page.
.br
``LAOLA Binary Structures'', 
.I http://snake.cs.tu-berlin.de:8081/~schwartz/pmh 
.br
``OpenOffice.Org's Excel Documentation'',
.I http:\/\/sc.openoffice.org/excelfileformat.pdf

Bell Labs OSI certified Powered by Plan 9

(Return to Plan 9 Home Page)

Copyright © 2021 Plan 9 Foundation. All Rights Reserved.
Comments to [email protected].