Hi Anders –
to do this most efficiently you need two things: a command-line utility to convert whatever filetype you have into .txt files, and a shell script, macro or other automation tool to batch process each.
In the packet I created for the Digital Fellows Workshops, you’ll find two examples of just such a shell script… designed to loop through the contents of a folder called ‘source’ (in the same directory as the script) and which processes each file until the entire list of files is processed.
Let’s look at a version of ‘loop-rename.sh’ customized to turn html files into txt files. We’ll call the script ‘loop-textify.sh’:
#!/bin/bash
function largeloop ()
{ while read line1; do
html2text source/$line1 -o output/$line1.txt
done
}
ls source | largeloop
Just save that code as a file called ‘,’ into the same folder that contains the ‘source’ folder. Then from the command line, type:
./loop-textify.sh
Note: I wrote the above example to use the open-source command-line tool ‘html2text,’ which can be found here: http://www.mbayer.de/html2text/readme.shtml
The script can do pdftotext, etc., by modifying it accordingly to use those tools.
Good luck!
PS My packet can be found here: http://www.mickikaufman.com/packet.zip