In addition to what type of files you’re starting with, what operating system are you running? If you’re on a mac, and you’re dealing with .html, .rtf, .doc or .docx files, you can use the built in textutil command line tool to batch convert documents. The following commands assume all your files are .docx files (change to .doc, .html, .rtf, etc as necessary)
Open a terminal window (/Applications/Utilities/Terminal.app) and type the following (where /base/path/to/files is the directory where your files are stored, e.g. ~/Documents/filestoconvert would be the folder “filestoconvert” in the Documents folder within your home directory):
textutil -convert txt /base/path/to/files/*.docx
If you want to concatenate all of the text into a single .txt file, use this command instead:
textutil -cat txt /base/path/to/files/*.docx
If you need to convert files recursively within a directory structure, you can try this:
find /base/path/to/files -name '*.docx' -print0 | xargs -0 textutil -convert txt
Edit: Obviously, if you want to concatenate the files into a single long ass text file recursively, use the previous command but replace “-convert” with “-cat”
-
This reply was modified 8 years, 3 months ago by
Keith Miyake.