diff --git a/ChangeLog b/ChangeLog index 2bce645..a808766 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,560 +1,561 @@ -LibPST 0.6.9 (2008-05-11) +LibPST 0.6.9 (2008-05-161) =============================== * Patch from Joachim Metz for 64 bit - compile - * Signed/unsigned cleanup from 'CFLAGS=-Wextra ./configure' + compile. + * Signed/unsigned cleanup from 'CFLAGS=-Wextra ./configure'. * Reindent vbuf.c to make it readable. + * Fix pst format documentation for 8 byte backpointers. LibPST 0.6.8 (2008-03-05) =============================== * Initial version of pst2dii to convert to Summation dii load file format. * Changes for Fedora packaging (#434727) LibPST 0.6.7 (2008-02-16) =============================== * Work around bogus 7c.b5 blocks in some messages that have been read. They appear to have attachments, but of some unknown format. Before the message was read, it did not have any attachments. * Use autoscan to cleanup our autoconf system. * Use autoconf to detect when we need to use our XGetopt files and other header files. * More fields, including BCC. * Fix missing LE32_CPU byte swapping for FILETIME types. LibPST 0.6.6 (2008-01-31) =============================== * More code cleanup, removing unnecessary null terminations on binary buffers. All pst file reads now go thru one function. Logging all pst reads to detect cases where we read the same data multiple times - discovers node sizes are actually 512 bytes. * Switch from cvs to mercurial source control. LibPST 0.6.5 (2008-01-22) =============================== * More code cleanup, removing obsolete code. All the boolean flags of type 0xb have length 4, so these are all 32 bits in the file. Libpst treats them all as 16 bits, but at least we are consistent. * More fields decoded - for example, see We should be able to use that data for much more complete decoding. * Move the rpm group to Applications/Productivity consistent with Evolution. LibPST 0.6.4 (2008-01-19) =============================== * More fixes for Outlook 2003 64 bit parsing. We observed cases of compressed RTF bodies (type 0x1009) with zero length. * Document type 0x0101 descriptor blocks and process them. * Fix large file support - we need to include config.h before any standard headers. * Merge following changes from svn snapshot from Alioth: * Add new fields to appointment for recurring events (SourceForge #304198) * Map IPM.Task items to PST_TYPE_TASK. * Applied patch to remove compiler warnings, thanks! (SourceForge #304314) * Fix crash with unknown reference type * Fix more memory issues detected by valgrind * lspst - add usage mesage and option parsing using getopt (SourceForge #304199) * Fix crash caused by invalid free calls * Fix crash when email subject is empty * Fix memory and information leak in hex debug dump LibPST 0.6.3 (2008-01-13) =============================== * More type consistency issues found by splint. LibPST 0.6.2 (2008-01-12) =============================== * More fixes for Outlook 2003 64 bit parsing. * All buffer sizes changed to size_t, all file offsets changed to off_t, all function names start with pst_, many other type consistency issues found by splint. Many changes to #llx in debug printing for 64 bit items. All id values are now uint64_t. LibPST 0.6.1 (2008-01-06) =============================== * Outlook 2003 64 bit parsing. Some documentation from Alexander Grau and patches from Sean Loaring . * fix from Antonio Palama for email items that happen to have item->contact non null, and were being processed as contacts. * Add large file support so we can read .pst files larger than 2gb. * Change lspst to be similar to readpst, properly using recursion to walk the tree, and testing item types. Add a man page for lspst. LibPST 0.5.12 (2007-10-02) =============================== * security fix from Brad Hards for buffer overruns in liv-zemple decoding for corrupted or malicious pst files. LibPST 0.5.11 (2007-08-24) =============================== * fix from Stevens Miller for unitialized variable. LibPST 0.5.10 (2007-08-20) =============================== * fix yet more valgrind errors - finally have a clean memory check. * restructure readpst.c for proper recursive tree walk. * buffer overrun test was backwards, introduced at 0.5.6 * fix broken email attachments, introduced at 0.5.6 LibPST 0.5.9 (2007-08-12) =============================== * fix more valgrind errors. LibPST 0.5.8 (2007-08-10) =============================== * fix more valgrind errors. lzfu_decompress needs to return the actual buffer size, since the lz header overestimates the size. This caused base64_encode to encode undefined bytes into the email attachment. LibPST 0.5.7 (2007-08-09) =============================== * fix valgrind errors, using uninitialized data. * improve debug logging and readpstlog for indented listings. * cleanup documentation. LibPST 0.5.6 (2007-07-15) =============================== * Fix to allow very small pst files with only one node in the tree. We were mixing signed/unsigned types in comparisons. * More progress decoding the basic structure 7c blocks. Many four byte values may be ID2 indices with data outside the buffer. * Start using doxygen to generate internal documentation. LibPST 0.5.5 (2007-07-10) =============================== * merge the following changes from Joe Nahmias version: * Lots of memory fixes. Thanks to Nigel Horne for his assistance tracking these down! * Fixed creation of vCards from contacts, thanks to Nigel Horne for his help with this! * fix for MIME multipart/alternative attachments. * added -c options to readpst manpage. * use 8.3 attachment filename if long filename isn't available. * new -b option to skip rtf-body.rtf attachments. * fix format of From header lines in mbox files. * Add more appointment fields, thanks to Chris Halls for tracking them down! LibPST 0.5.4 (2006-02-25) =============================== * patches from Arne, adding MH mode, remove leading zeros from the generated numbered filenames starting with one rather than zero. Miscellaneous code cleanup. * document the "7c" descriptor block format. LibPST 0.5.3 (2006-02-20) =============================== * switch to gnu autoconf/automake. This breaks the MS VC++ projects since the source code is now in the src subdirectory. * documentation switched to xml, building man pages and html from the master xml copy. * include rpm .spec file for building src and binary rpms. LibPST 0.5.2 (2006-02-18) =============================== * Added pst2ldif to convert the contacts to ldif format for import into ldap databases. * Major changes to libpst.c to properly use the node depth values from the b-tree nodes. We also use the item count values in the nodes rather than trying to guess how many items are active. * Cleanup whitespace - using tabs for every four columns. LibPST 0.5.1 (17 November 2004) =============================== Well, alot has happened since the last release of libpst. Release / Management: * The project has forked! The new maintainer is Joseph Nahmias. * We have changed hosting sites, thanks to sourceforge for hosting to this point. From this point forward we will be using alioth.debian.org. * The project is now using SubVersioN for source control. You can get the latest code by running: svn co svn://svn.debian.org/svn/libpst/trunk . * See for more information. Code Changes: * Added lspst program to list items in a PST. Still incomplete. * Added vim folding markers to readpst.c * avoid the pseudo-prologue that MS prepends to the email headers * fix build on msvc, since it doesn't have sys/param.h * Re-vamped Makefile: * Only define CFLAGS in Makefileif missing * fixed {un,}install targets in Makefile * Fixed up build process in Makefile * Added mozilla conversion script from David Binard * Fixed bogus creation of readpst.log on every invocation * escaped dashes and apostrophe in manpages * Updated TODO * added manpages from debian pkg * fix escaped-string length count to consider '\n', thanks to Paul Bakker . * ensure there's a blank line between header and body patch from (SourceForge #890745). * Apply accumulated endian-related patches * Removed unused files, upstream's debian/ dir -- Joe Nahmias LibPST v0.5 =========== It is with GREAT relief that I bring you version 0.5 of the LibPST tools! Through great difficulties, this tool has survived and expanded to become even better. The changes are as follows: * RTF support. We can now decompress RTF bodies in emails, and are saved as attachments * Better support in reading the indexes. Fixed many bugs with them * Improved reliability. "Now we are getting somewhere!" * Improved compiling. Hopefully we won't be hitting too many compile errors now. * vCard handling. Contacts are now exported as vCard entries. * vEvent handling. Support has begun on exporting Calendar entries as events * Support for Journal entries has also begun If you have any problems with this release, don't hesitate to contact me. These changes come to you, as always, free under the GPL license!! What a wonderful thing it is. It does mean that you can write your own program off of this library and distribute it also for free. However, anyone with commercial interests for developing applications they will be charging for are encouraged to get in touch with me, as I am sure we can come to some arrangement. Dave Smith LibPST v0.4.3 ============= Bug fix release. No extra functionality Dave Smith LibPST v0.4.2 ============= The debug system has had an overhaul. The debug messages are no longer printed to the screen when they are enabled. They are dumped to a binary file. There is another utility called "readlog" that I have written to handle these log files. It should make it easier to selectively view bits of a log file. It also shows the position that the log message was printed from. There is a new switch in readpst. It is -d. It enables the user to specify the log file which the binary log is written to. If the switch isn't used, the default file of "readpst.log" is used. The code is now Visual C++ compatible. It has compiled on Visual C++ .net Standard edition, and produces the readpst.exe file. Use the project file included in this distribution. There have been minor improvements elsewhere too. LibPST v0.4.1 ============= Fixed a couple more bugs. Is it me or do bugs just insert themselves in random, hard to find places! Cured a few problems with regard to emails with multiple embeded items. They are not fully re-created using Mime-types, but are accessible with the -S switch (which saves everything as seperate items) Fixed a problem reading the first index. Back sliders are now detected. (ie when the value following the current one is smaller, not bigger!) Added some error messages when we try and read outside of the PST file, this was causing a few problems before, cause the return value wasn't always checked, so it was possible to be reading random data, and trying to make sense of it! Anyway, if you find any problems, don't hesitate to mail me Dave Smith LibPST v0.4 =========== Fixed a nasty bug that occasionally corrupted attachments. Another bug with regard to reading of indexes (also occasional). Another output method has been added which is called "Seperate". It is activated with the -S switch. It operates in the following manor: |--Inbox-->000000 | 000001 | 000002 |--Sentmail-->0000000 | 0000001 | 0000002 All the emails are stored in seperate files counting from 0 upwards, in a folder named as the PST folder. When an email has an attachment, it is saved as a seperate file. The filename for the attachment is made up of 2 parts, the first is the email number to which it belongs, the second is its filename. The should now be runnable on big-endian machines, if the define.h file is first modified. The #define LITTLE_ENDIAN must be commented out, and the #define BIG_ENDIAN must be uncommented. More verbose error messages have been added. Apparently people got confused when the program stopped for no visible reason. This has now been resolved. Thanks for the continued support of all people involved. Dave Smith Libpst v0.3.4 ============= Several more fixes. An Infinite loop and incorrect interpreting of item index attributes. Work has started on making the code executable on big endian CPUs. At present it should work with Linux on these CPUs, but I would appreciate it if you could provide feedback with regard to it's performance. I am also working with some other people at make it operate on Solaris. A whole load more items are now recognized by the Item records. With more items in Emails and Folders. I haven't got to the Contacts yet. Anyway, this is what I would call a minor feature enhancment and bugfix release. Dave Smith LibPST v0.3.3 ============= Fixed several items. Mainly memory leaks. Loads of them! oops.. I have added a new program, mainly of debugging, which when passed an ID value and a pst file, will extract and decrypt that ID from the pst file. I don't see it being a huge attraction, or of much use to most people, but it is another example of writing an application to use the libpst interface. Another fix was in the reading of the item index. This has hopefully now been corrected. The result of this bug was that not all the emails in a folder were converted. Hopefully you should have more luck now. Dave Smith LibPST v0.3.2 ============= Quick bugfix release. There was a bug in the decryption of the basic encryption that outlook uses. One byte, 0x6c, was incorrectly decrypted to 0x6c instead of 0xcd. This release fixes this bug. Sorry... LibPST v0.3.1 ============= Minor improvements. Fixed bug when linking multiple blocks together, so now the linking blocks are not "encrypted" when trying to read them. LibPST v0.3 =========== A lot of bug fixing has been done for this release. Testing has been done on the creation of the files by readpst. Better handling of large binaries being extracted from the PST file has been implemented. Quite a few reports have come in about not being able to compile on Darwin. This could be down to using macros with variable parameter lists. This has now been changed to use C functions with variable parameters. I hope this fixes a lot of problems. Added support for recreating the folder structure into normal directories. For Instance: Personal Folders |-Inbox | |-Jokes | |-Meetings |-Send Items each folder containing an mbox file with the correct emails for that folder. Dave Smith LibPST v0.3 beta1 ================= Again, a shed load of enhancements. More work has been done on the mime creation. A bug has been fixed that was letting part of the attachments that were created disappear. A major enhancement is that "compressible encryption" support has been added. This was an incredibly simple method to use. It is basically a ceasar cipher. It has been noted by several users already that the PST password that Outlook uses, serves *no purpose*. It is not used to encrypt the PST, it is mearly stored there. This means that the readpst application is able to convert PST files without knowing the password. Microsoft have some explaning to do! Output files are now not overwritten if they already exist. This means that if you have two folders in your PST file named "fred", the first one encountered will be named "fred" and the second one will be named "fred00000001". As you can see, there is enough room there for many duplicate names! Output filenames are now restricted. Any "/" or "\" characters in the name are replaced with "_". If you find that there are any other characters that need to be changed, could you please make me aware! Thanks to Berry Wizard for help with supporting the encryption. Thanks to Auke Kok, Carolus Walraven and Yogesh Kumar Guatam for providing debugging information and testing. Dave Smith LibPST v0.2 beta1 ================= Hello once more... Attachments are now re-created in mime format. The method is very crude and could be prone to over generalisation. Please test this version, and if attachments are not recreated correctly, please send me the email (complete message source) of the original and converted. Cheers. I hope this will work for everyone who uses this program, but reality can be very different! Let us see how it goes... Dave Smith LibPST v0.2 alpha1 =========== Hello! Some improvements. The internal code has been changed so that attachments are now processed and loaded into the structures. The readpst program is not finished yet. It needs to convert these binary structs into mime data. At present it just saves them to the current directory, overwriting any previous files with the attachment name. Improvements over previous version: * KMail output is supported - if the "-k" flag is specified, all the directory hierarchy is created using the KMail standard * Lots of bugs and memory leaks fixed Usage: ReadPST v0.2alpha1 implementing LibPST v0.2alpha1 Usage: ./readpst [OPTIONS] {PST FILENAME} OPTIONS: -h - Help. This screen -k - KMail. Output in kmail format -o - Output Dir. Directory to write files to. CWD is changed *after* opening pst file -V - Version. Display program version If you want to view lots of debug output, modify a line in "define.h" from "//#define DEBUG_ALL" to "#define DEBUG_ALL". It would then be advisable to pipe all output to a log file: ./readpst -o out pst_file &> logfile Dave Smith LibPST v0.1 =========== Hi Folks! This has been a long, hard slog, but I now feel that I have got somewhere useful. The included program "main" is able to read an Outlook PST file and dump the emails into mbox files, separating each folder into a different mbox file. All the mbox files are stored in the current directory and no attempt is yet made to organise these files into a directory hierarchy. This would not be too difficult to achieve though. Email attachments are not yet handled, neither are Contacts. There is no pretty interface yet, but you can convert a PST file in the following manner ./main {path to PST file} This is very much a work in progress, but I thought I should release this code so that people can lose their conception that outlook files will never be converted to Linux. I am intending that the code I am writing will be developed into greater applications to provide USEFUL tools for accessing and converting PST files into a variety of formats. One point I feel I should make is that Outlook, by default, creates "Compressible Encryption" PST files. I have not, as yet, attempted to write any decryption routines, so you will not be able to convert these files. However, if you create a new PST file and choose not to make an encrypted one, you can copy all your emails into this new one and then convert the unencrypted one. I hope you enjoy, Dave Smith diff --git a/NEWS b/NEWS index 5445c65..b66e219 100644 --- a/NEWS +++ b/NEWS @@ -1,20 +1,20 @@ -0.6.9 2008-05-11 Patch from Joachim Metz for 64 bit compile. +0.6.9 2008-05-16 Patch from Joachim Metz for 64 bit compile. 0.6.8 2008-03-05 Initial version of pst2dii to convert to Summation dii load file format. 0.6.7 2008-02-16 Ignore unknown attachments on some read messages; autoconf cleanup. 0.6.6 2008-01-31 Code cleanup, switch from cvs to mercurial source control. 0.6.5 2008-01-22 Code cleanup, rpm group Applications/Productivity. 0.6.4 2008-01-19 More fixes for 64 bit format, merge changes from svn Alioth. 0.6.3 2008-01-13 More type consistency issues found by splint. 0.6.2 2008-01-12 More fixes for 64 bit format, consistent types size_t, off_t, etc. 0.6.1 2008-01-06 Outlook 2003 64 bit format and fix for bogus contacts. 0.5.12 2007-10-02 security fix for possible buffer overruns in liv-zemple decoding 0.5.11 2007-08-24 fix for unitialized variable 0.5.10 2007-08-20 fix yet more valgrind errors, restructure readpst recursive walk, backwards overrun test 0.5.9 2007-08-12 fix more valgrind errors, pst2ldif wrote undefined data 0.5.8 2007-08-10 lzfu_decompress/base64_encode encoded random data into attachment 0.5.7 2007-08-09 fix valgrind errors, using uninitialized data 0.5.6 2007-07-15 handle small pst files, better decoding of 7c blocks 0.5.5 2007-07-10 merge changes from Joe Nahmias version 0.5.4 2006-02-25 add MH mode, generated filenames with no leading zeros 0.5.3 2006-02-20 switch to gnu autoconf/automake 0.5.2 2006-02-18 add pst2ldif, fix btree processing in libpst.c diff --git a/libpst.spec.in b/libpst.spec.in index 5293a63..e18f605 100644 --- a/libpst.spec.in +++ b/libpst.spec.in @@ -1,62 +1,63 @@ Summary: Utilities to convert Outlook .pst files to other formats Name: @PACKAGE@ Version: @VERSION@ Release: 1%{?dist} License: GPLv2+ Group: Applications/Productivity Source: http://www.five-ten-sg.com/%{name}/packages/%{name}-%{version}.tar.gz BuildRoot: %(mktemp -ud %{_tmppath}/%{name}-%{version}-%{release}-XXXXXX) URL: http://www.five-ten-sg.com/%{name}/ Requires: ImageMagick BuildRequires: ImageMagick freetype-devel gd-devel libjpeg-devel zlib-devel %description The Libpst utilities include readpst which can convert email messages to both mbox and MH mailbox formats, pst2ldif which can convert the contacts to .ldif format for import into ldap databases, and pst2dii which can convert email messages to the DII load file format used by Summation. %prep %setup -q %build %configure make %{?_smp_mflags} %install rm -rf $RPM_BUILD_ROOT make DESTDIR=$RPM_BUILD_ROOT install %clean rm -rf $RPM_BUILD_ROOT %files %defattr(-,root,root,-) %{_bindir}/* %{_mandir}/man1/* %{_mandir}/man5/* %docdir %{_datadir}/doc/%{name}-%{version} %{_datadir}/doc/%{name}-%{version} %changelog -* Sun May 11 2008 Carl Byington - 0.6.9 +* Fri May 16 2008 Carl Byington - 0.6.9 - Patch from Joachim Metz for 64 bit compile. +- Fix pst format documentation for 8 byte backpointers. * Wed Mar 05 2008 Carl Byington - 0.6.8 - Initial version of pst2dii to convert to Summation dii load file format - changes for Fedora packaging guidelines (#434727) * Tue Jul 10 2007 Carl Byington - 0.5.5 - merge changes from Joe Nahmias version * Sun Feb 19 2006 Carl Byington - 0.5.3 - initial spec file using autoconf and http://www.fedora.us/docs/rpm-packaging-guidelines.html diff --git a/regression/regression-tests.bash b/regression/regression-tests.bash index ec10c05..5e638e4 100644 --- a/regression/regression-tests.bash +++ b/regression/regression-tests.bash @@ -1,52 +1,54 @@ #!/bin/bash val="valgrind --leak-check=full" val='' for i in {1..10}; do rm -rf output$i mkdir output$i done -hash=$(md5sum ams.pst) -pre="$hash -bates-" -$val ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "$pre" -o output1 -O mydii -d dumper ams.pst - ../src/readpstlog -f I dumper >ams.log -$val ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "bates-" -o output2 -O mydii2 -d dumper sample_64.pst - ../src/readpstlog -f I dumper >sample_64.log -$val ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "bates-" -o output3 -O mydii3 -d dumper test.pst - ../src/readpstlog -f I dumper >test.log - ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "bates-" -o output4 -O mydii4 -d dumper big_mail.pst - ../src/readpstlog -f I dumper >big_mail.log -exit +if [ "$1" == "dii" ]; then + hash=$(md5sum ams.pst) + pre="$hash + bates-" + $val ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "$pre" -o output1 -O mydii -d dumper ams.pst + ../src/readpstlog -f I dumper >ams.log + $val ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "bates-" -o output2 -O mydii2 -d dumper sample_64.pst + ../src/readpstlog -f I dumper >sample_64.log + $val ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "bates-" -o output3 -O mydii3 -d dumper test.pst + ../src/readpstlog -f I dumper >test.log + ../src/pst2dii -f /usr/share/fonts/bitstream-vera/VeraMono.ttf -B "bates-" -o output4 -O mydii4 -d dumper big_mail.pst + ../src/readpstlog -f I dumper >big_mail.log + exit +fi $val ../src/pst2ldif -b 'o=ams-cc.com, c=US' -c 'newPerson' ams.pst >ams.err 2>&1 $val ../src/readpst -cv -o output1 ams.pst >out1.err 2>&1 $val ../src/readpst -cl -r -o output2 ams.pst >out2.err 2>&1 $val ../src/readpst -S -o output3 ams.pst >out3.err 2>&1 $val ../src/readpst -M -o output4 ams.pst >out4.err 2>&1 $val ../src/readpst -o output5 -d dumper mbmg.archive.pst >out5.err 2>&1 ../src/readpstlog -f I dumper >mbmg.archive.log $val ../src/readpst -o output6 -d dumper test.pst >out6.err 2>&1 ../src/readpstlog -f I dumper >test.log $val ../src/readpst -cv -o output7 -d dumper sample_64.pst >out7.err 2>&1 ../src/readpstlog -f I dumper >sample_64.log $val ../src/readpst -cv -o output8 -d dumper big_mail.pst >out8.err 2>&1 ../src/readpstlog -f I dumper >big_mail.log $val ../src/readpst -cv -o output9 -d dumper Single2003-read.pst >out9.err 2>&1 ../src/readpstlog -f I dumper >Single2003-read.log $val ../src/readpst -cv -o output10 -d dumper Single2003-unread.pst >out10.err 2>&1 ../src/readpstlog -f I dumper >Single2003-unread.log $val ../src/lspst -d dumper ams.pst >out11.err 2>&1 ../src/readpstlog -f I dumper >ams.log rm -f dumper diff --git a/xml/libpst.in b/xml/libpst.in index 49db06e..35cfa00 100644 --- a/xml/libpst.in +++ b/xml/libpst.in @@ -1,2041 +1,2049 @@ @PACKAGE@ Utilities - Version @VERSION@ Packages This is a fork of the libpst project at SourceForge. Another fork is located at http://alioth.debian.org/projects/libpst/ The various source and binary packages are available at http://www.five-ten-sg.com/@PACKAGE@/packages/. The most recent documentation is available at http://www.five-ten-sg.com/@PACKAGE@/. A Mercurial source code repository for this project is available at http://hg.five-ten-sg.com/@PACKAGE@/. This version can now convert both 32 bit Outlook files (pre 2003), and the 64 bit Outlook 2003 pst files. Utilities are supplied to convert email messages to both mbox and MH mailbox formats, and to DII load file format for use with many of the CT Summation products. Contacts can be converted to a simple list, to vcard format, or to ldif format for import to an LDAP server. - 2008-02-23 + 2008-05-16 readpst 1 readpst @VERSION@ readpst convert PST (MS Outlook Personal Folders) files to mbox and other formats Synopsis readpst pstfile Description readpst is a program that can read an Outlook PST (Personal Folders) file and convert it into an mbox file, a format suitable for KMail, a recursive mbox structure, or separate emails. Options -C Decrypt the entire pst file and dump it to stdout. -M Output messages in MH format as separate files. This will create folders as named in the PST file, and will put each email together with any attachments into its own file. These files will be numbered from 1 to n with no leading zeros. -S Output messages into separate files. This will create folders as named in the PST file, and will put each email in its own file. These files will be numbered from 1 increasing in intervals of 1 (ie 1, 2, 3, ...). Any attachments are saved alongside each email as XXXXXXXXX-attach1, XXXXXXXXX-attach2 and so on, or with the name of the attachment if one is present. -V Show program version and exit. -b Do not save the attachments for the RTF format of the email body. -c format Set the Contact output mode. Use -cv for vcard format or -cl for an email list. -d debug-file Specify name of debug log file. The log file is not an ascii file, it is a binary file readable by readpstlog. -h Show summary of options and exit. -k Changes the output format to KMail. -o output-directory Specifies the output directory. The directory must already exist, and is entered after the PST file is opened, but before any processing of files commences. -q Changes to silent mode. No feedback is printed to the screen, except for error messages. -r Changes the output format to Recursive. This will create folders as named in the PST file, and will put all emails in a file called "mbox" inside each folder. These files are then compatible with all mbox-compatible email clients. -w Overwrite any previous output files. Beware: When used with the -S switch, this will remove all files from the target folder before writing. This is to keep the count of emails and attachments correct. See Also readpstlog 1 Author This manual page was originally written by Dave Smith <dave.s@earthcorp.com>, and updated by Joe Nahmias <joe@nahmias.net> for the Debian GNU/Linux system (but may be used by others). It was subsequently updated by Brad Hards <bradh@frogmouth.net>, and converted to xml format by Carl Byington <carl@five-ten-sg.com>. Copyright Copyright (C) 2002 by David Smith <dave.s@earthcorp.com>. XML version Copyright (C) 2006 by 510 Software Group <carl@five-ten-sg.com>. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. Version @VERSION@ - 2008-02-23 + 2008-05-16 lspst 1 lspst @VERSION@ lspst list PST (MS Outlook Personal Folders) file data Synopsis lspst pstfile Options -V Show program version and exit. -d debug-file Specify name of debug log file. The log file is not an ascii file, it is a binary file readable by readpstlog. -h Show summary of options and exit. Description lspst is a program that can read an Outlook PST (Personal Folders) file and produce a simple listing of the data (contacts, email subjects, etc). See Also readpstlog 1 Author lspst was written by Joe Nahmias <joe@nahmias.net> based on readpst. This man page was written by 510 Software Group <carl@five-ten-sg.com>. Copyright Copyright (C) 2004 by Joe Nahmias <joe@nahmias.net>. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. Version @VERSION@ - 2008-02-23 + 2008-05-16 readpstlog 1 readpstlog @VERSION@ readpstlog convert a readpst logfile to text format Synopsis readpstlog logfile Description readpstlog is a program that converts the binary logfile generated by readpst to a more desirable text format. Options -f format Sets the format of the text log output. Currently, the only valid output formats are T, for single line text, D for the default default multi line format, and I for an indented style with single line text. -t include-types Print only the specified types of log messages. Types are specified in a comma-delimited list (e.g. 3,10,5,6). -x exclude-types Exclude the specified types of log messages. Types are specified in a comma-delimited list (e.g. 3,10,5,6). Message Types readpstlog understands the following types of log messages: 1 File accesses 2 Index accesses 3 New email found 4 Warnings 5 Read accesses 6 Informational messages 7 Main function calls 8 Decrypting calls 9 Function entries 10 Function exits 11 HexDump calls Author This manual page was written by Joe Nahmias <joe@nahmias.net> for the Debian GNU/Linux system (but may be used by others). It was converted to xml format by Carl Byington <carl@five-ten-sg.com>. Copyright Copyright (C) 2002 by David Smith <dave.s@earthcorp.com>. XML version Copyright (C) 2005 by 510 Software Group <carl@five-ten-sg.com>. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. Version @VERSION@ - 2008-02-23 + 2008-05-16 pst2ldif 1 pst2ldif @VERSION@ pst2ldif extract contacts from a MS Outlook .pst file in .ldif format Synopsis pst2ldif pstfilename Options -V Show program version. Subsequent options are then ignored. -b ldap-base Sets the ldap base value used in the dn records. You probably want to use something like "o=organization, c=US". -c class Sets the objectClass values for the contact items. This class needs to be defined in the schema used by your LDAP server, and at a minimum it must contain the ldap attributes given below. -d debug-file Specify name of debug log file. The log file is not an ascii file, it is a binary file readable by readpstlog. -h Show summary of options. Subsequent options are then ignored. Description pst2ldif reads the contact information from a MS Outlook .pst file and produces a .ldif file that may be used to import those contacts into an LDAP database. The following ldap attributes are generated: cn givenName sn personalTitle company mail postalAddress l st postalCode c homePhone telephoneNumber facsimileTelephoneNumber mobile description Copyright Copyright (C) 2006 by 510 Software Group <carl@five-ten-sg.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. Version @VERSION@ - 2008-02-23 + 2008-05-16 pst2dii 1 pst2dii @VERSION@ pst2dii extract email messages from a MS Outlook .pst file in DII load format Synopsis pst2dii pstfilename Options -B bates-prefix Sets the bates prefix string. The bates sequence number is appended to this string, and printed on each page. -O dii-output-file Name of the output DII load file. -V Show program version. Subsequent options are then ignored. -b bates-number Starting bates sequence number. The default is zero. -c bates-color Font color for the bates stamp on each page, specified as 6 hex digits as rrggbb values. The default is ff0000 for bright red. -d debug-file Specify name of debug log file. The log file is not an ascii file, it is a binary file readable by readpstlog. -f ttf-font-file Specify name of a true type font file. This should be a fixed pitch font. -h Show summary of options. Subsequent options are then ignored. -o output-directory Specifies the output directory. The directory must already exist. Description pst2dii reads the email messages from a MS Outlook .pst file and produces a DII load file that may be used to import message summaries into a Summation DII system. The DII output file contains references to the image and attachment files in the output directory. Copyright Copyright (C) 2008 by 510 Software Group <carl@five-ten-sg.com> This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. You should have received a copy of the GNU General Public License along with this program; see the file COPYING. If not, please write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA. Version @VERSION@ - 2008-02-23 + 2008-05-16 outlook.pst 5 outlook.pst format of MS Outlook .pst file Synopsis outlook.pst Overview Each item in a .pst file is identified by two id values ID1 and ID2. There are two separate b-trees indexed by these ID1 and ID2 values. Starting with Outlook 2003, the file format changed from one with 32 bit pointers, to one with 64 bit pointers. We describe both formats here. 32 bit File Header The 32 bit file header is located at offset 0 in the .pst file. We only support index types 0x0e and 0x17, and encryption types 0x00 and 0x01. Index type 0x0e is the older 32 bit Outlook format. Index type 0x17 is the newer 64 bit Outlook format. Encryption type 0x00 is no encryption, and type 0x01 is the only other supported encryption type. offsetIndex1 is the file offset of the root of the index1 b-tree, which contains (ID1, offset, size, unknown) tuples for each item in the file. backPointer1 is the value that should appear in the parent pointer of that root node. offsetIndex2 is the file offset of the root of the index2 b-tree, which contains (ID2, DESC-ID1, LIST-ID1, PARENT-ID2) tuples for each item in the file. backPointer2 is the value that should appear in the parent pointer of that root node. 64 bit File Header The 64 bit file header is located at offset 0 in the .pst file. 32 bit Index 1 Node The 32 bit index1 b-tree nodes are 512 byte blocks with the following format. The itemCount specifies the number of 12 byte records that are active. The nodeLevel is non-zero for this style of nodes. The leaf nodes have a different format. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a triple of (ID1, backPointer, offset) where the offset points to the next deeper node in the tree, the backPointer value must match the backPointer in that deeper node, and ID1 is the lowest ID1 value in the subtree. 64 bit Index 1 Node The 64 bit index1 b-tree nodes are 512 byte blocks with the following format. The itemCount specifies the number of 24 byte records that are active. The nodeLevel is non-zero for this style of nodes. The leaf nodes have a different format. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a triple of (ID1, backPointer, offset) where the offset points to the next deeper node in the tree, the backPointer value must match the backPointer in that deeper node, and ID1 is the lowest ID1 value in the subtree. 32 bit Index 1 Leaf Node The 32 bit index1 b-tree leaf nodes are 512 byte blocks with the following format. The itemCount specifies the number of 12 byte records that are active. The nodeLevel is zero for these leaf nodes. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a tuple of (ID1, offset, size, unknown) The two low order bits of the ID1 value seem to be flags. I have never seen a case with bit zero set. Bit one indicates that the item is not encrypted. Note that references to these ID1 values elsewhere may have the low order bit set (and I don't know what that means), but when we do the search in this tree we need to clear that bit so that we can find the correct item. 64 bit Index 1 Leaf Node The 64 bit index1 b-tree leaf nodes are 512 byte blocks with the following format. The itemCount specifies the number of 24 byte records that are active. The nodeLevel is zero for these leaf nodes. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a tuple of (ID1, offset, size, unknown) The two low order bits of the ID1 value seem to be flags. I have never seen a case with bit zero set. Bit one indicates that the item is not encrypted. Note that references to these ID1 values elsewhere may have the low order bit set (and I don't know what that means), but when we do the search in this tree we need to clear that bit so that we can find the correct item. 32 bit Index 2 Node The 32 bit index2 b-tree nodes are 512 byte blocks with the following format. The itemCount specifies the number of 12 byte records that are active. The nodeLevel is non-zero for this style of nodes. The leaf nodes have a different format. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a triple of (ID2, backPointer, offset) where the offset points to the next deeper node in the tree, the backPointer value must match the backPointer in that deeper node, and ID2 is the lowest ID2 value in the subtree. 64 bit Index 2 Node The 64 bit index2 b-tree nodes are 512 byte blocks with the following format. The itemCount specifies the number of 24 byte records that are active. The nodeLevel is non-zero for this style of nodes. The leaf nodes have a different format. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a triple of (ID2, backPointer, offset) where the offset points to the next deeper node in the tree, the backPointer value must match the backPointer in that deeper node, and ID2 is the lowest ID2 value in the subtree. 32 bit Index 2 Leaf Node The 32 bit index2 b-tree leaf nodes are 512 byte blocks with the following format. The itemCount specifies the number of 16 byte records that are active. The nodeLevel is zero for these leaf nodes. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a tuple of (ID2, DESC-ID1, LIST-ID1, PARENT-ID2) 64 bit Index 2 Leaf Node The 64 bit index2 b-tree leaf nodes are 512 byte blocks with the following format. The itemCount specifies the number of 32 byte records that are active. The nodeLevel is zero for these leaf nodes. The backPointer must match the backPointer from the triple that pointed to this node. Each item in this node is a tuple of (ID2, DESC-ID1, LIST-ID1, PARENT-ID2) 32 bit Associated List Item 0x0002 Contains associations between id1 and id2 for the items controlled by the record. In the above 32 bit leaf node, we have a tuple of (0x61, 0x02a82c, 0x02a836, 0) 0x02a836 is the ID1 of the associated list, and we can lookup that ID1 value in the index1 b-tree to find the (offset,size) of the data in the .pst file. 64 bit Associated List Item 0x0002 Contains associations between id1 and id2 for the items controlled by the record. Associated Descriptor Item 0xbcec Contains information about the item, which may be email, contact, or other outlook types. In the above leaf node, we have a tuple of (0x21, 0x00e638, 0, 0) 0x00e638 is the ID1 of the associated descriptor, and we can lookup that ID1 value in the index1 b-tree to find the (offset,size) of the data in the .pst file. Note the signature of 0xbcec. There are other descriptor block formats with other signatures. Note the indexOffset of 0x013c - starting at that position in the descriptor block, we have an array of two byte integers. The first integer (0x000b) is a (count-1) of the number of overlapping pairs following the count. The first pair is (0, 0xc), the next pair is (0xc, 0x14) and the last (12th) pair is (0x123, 0x13b). These pairs are (start,end+1) offsets of items in this block. So we have count+2 integers following the count value. Note the b5offset of 0x0020, which is a type that I will call an index reference. Such index references have at least two different forms, and may point to data either in this block, or in some other block. External pointer references have the low order 4 bits all set, and are ID2 values that can be used to fetch data. This value of 0x0020 is an internal pointer reference, which needs to be right shifted by 4 bits to become 0x0002, which is then a byte offset to be added to the above indexOffset plus two (to skip the count), so it points to the (0xc, 0x14) pair. So far we have only described internal index references where the high order 16 bits are zero. That suffices for single descriptor blocks. But in the case of the type 0x0101 descriptor block, we have an array of subblocks. In this case, the high order 16 bits of an internal index reference are used to select the subblock. Each subblock starts with a 16 bit indexOffset which points to the count and array of 16 bit integer pairs which are offsets in the current subblock. Finally, we have the offset and size of the "b5" block located at offset 0xc with a size of 8 bytes in this descriptor block. The "b5" block has the following format: Note the descoffset of 0x0040, which again is an index reference. In this case, it is an internal pointer reference, which needs to be right shifted by 4 bits to become 0x0004, which is then a byte offset to be added to the above indexOffset plus two (to skip the count), so it points to the (0x14, 0x7c) pair. We now have the offset 0x14 of the descriptor array, composed of 8 byte entries. Each descriptor entry has the following format: For some reference types (2, 3, 0xb) the value is used directly. Otherwise, the value is an index reference, which is either an ID2 value, or an offset, to be right shifted by 4 bits and used to fetch a pair from the index table to find the offset and size of the item in this descriptor block. The following reference types are known, but not all of these are implemented in the code yet. The following item types are known, but not all of these are implemented in the code yet. Associated Descriptor Item 0x7cec This style of descriptor block is similar to the 0xbcec format. Note the signature of 0x7cec. There are other descriptor block formats with other signatures. Note the indexOffset of 0x017a - starting at that position in the descriptor block, we have an array of two byte integers. The first integer (0x0006) is a (count-1) of the number of overlapping pairs following the count. The first pair is (0, 0xc), the next pair is (0xc, 0x14) and the last (7th) pair is (0x160, 0x179). These pairs are (start,end+1) offsets of items in this block. So we have count+2 integers following the count value. Note the 7coffset of 0x0040, which is an index reference. In this case, it is an internal reference pointer, which needs to be right shifted by 4 bits to become 0x0004, which is then a byte offset to be added to the above indexOffset plus two (to skip the count), so it points to the (0x14, 0xea) pair. We have the offset and size of the "7c" block located at offset 0x14 with a size of 214 bytes in this case. The "7c" block starts with a header with the following format: Note the b5Offset of 0x0020, which is an index reference. In this case, it is an internal reference pointer, which needs to be right shifted by 4 bits to become 0x0002, which is then a byte offset to be added to the above indexOffset plus two (to skip the count), so it points to the (0xc, 0x14) pair. Finally, we have the offset and size of the "b5" block located at offset 0xc with a size of 8 bytes in this descriptor block. The "b5" block has the following format: Note the descoffset of 0x0060, which again is an index reference. In this case, it is an internal pointer reference, which needs to be right shifted by 4 bits to become 0x0006, which is then a byte offset to be added to the above indexOffset plus two (to skip the count), so it points to the (0xea, 0xf0) pair. That gives us (0xf0 - 0xea)/6 = 1, so we have a recordCount of one. The actual data between 0xea and 0xf0 is unknown and unused here. We have seen cases where the descoffset in the b5 block is zero, and the index2Offset in the 7c block is zero. This has been seen for objects that seem to be attachments on messages that have been read. Before the message was read, it did not have any attachments. Note the index2Offset above of 0x0080, which again is an index reference. In this case, it is an internal pointer reference, which needs to be right shifted by 4 bits to become 0x0008, which is then a byte offset to be added to the above indexOffset plus two (to skip the count), so it points to the (0xf0, 0x155) pair. This is an array of tables of four byte integers. We will call these the IND2 tables. The size of each of these tables is specified by the recordSize field of the "7c" header. The number of these tables is the above recordCount value derived from the "b5" block. Now the remaining data in the "7c" block after the header starts at offset 0x2a. There should be itemCount 8 byte items here, with the following format: The ind2Offset is a byte offset into the current IND2 table of some value. If that is a four byte integer value, then once we fetch that, we have the same triple (item type, reference type, value) as we find in the 0xbcec style descriptor blocks. If not, then this value is used directly. These 8 byte descriptors are processed recordCount times, each time using the next IND2 table. The item and reference types are as described above for the 0xbcec format descriptor block. 32 bit Associated Descriptor Item 0x0101 This descriptor block contains a list of ID1 values. It is used when an ID1 (that would normally point to a type 0x7cec or 0xbcec descriptor block) contains more data than can fit in any single descriptor of those types. In this case, it points to a type 0x0101 block, which contains a list of ID1 values that themselves point to the actual descriptor blocks. The total length value in the 0x0101 header is the sum of the lengths of the blocks pointed to by the list of ID1 values. The result is an array of subblocks, that may contain index references where the high order 16 bits specify which descriptor subblock to use. Only the first descriptor subblock contains the signature (0xbcec or 0x7cec). 64 bit Associated Descriptor Item 0x0101 This descriptor block contains a list of ID1 values.