The preliminary results using the standard and modified NLTK bayesian classifier were almost suspiciously high. As a way to double check the results I used an NLTK feature that shows you the features that most […]
I presented an overview of my results in class, but wanted to take a minute to discuss my preliminary results here. My most meaningful set so far has been an attempt to classify author using the complete works of […]
Python has been very easy to work with, but larger datasets are starting to cause problems. Some of the issues may be related to the IDLE shell that I am working with, although some research has shown many other […]
One of NLTK’s better features is the standard tokenizer. It integrates particularly well with the default dictionaries and corpora NLTK makes available. It only takes a single line to generate a list of stop […]
One of the most irritating issues I’ve had with Python has been the stability of the default IDLE interface. While I understand that most languages can be used to crash a machine, but it is far too easy in python […]
NLTK offers a number of different models or object types to represent the data you’re working with. One of the most common types in NLTK is a corpus which represents a body of texts to analyze. Although it […]
While I was reading through the code of NLTK’s bayesian classifier I noticed something a little unusual about it. Naive bayes can be based on two different underlying models of the features. The first is a […]
I originally was working with WingIDE, but have almost completely switched over to IDLE. It is not as full featured as something like Wing, but there are a few major pluses. First, it is part of Python, nothing […]
NLTK has a built in corpus datatype, as well as access to a number of common corpora (like the Brown Corpora), many of which are already tagged. This makes it easy to jump in and practice using things like the […]
The first model that I am working with is based on naive bayes with the frequency of words as the feature being measured. Although this is a standard baseline to work with, I am looking for some more sophisticated […]
So for my real project I will need Moretti’s underlying data from his project. Without that not only do I not have classifications, I don’t even know what books are being analyzed. So until then I chose to work […]
Python has a number of specialized IDEs which was interested in trying. Since Python emphasizes easy of development, I was hoping someone had created an easy to use IDE. There are many good languages out there […]
Python currently exists in two main variations: python version 2.x and version 3.x. Python 3 is now over 4 years old, but is not backwards compatible with python 2. This means all programs and libraries written in […]
I’d like to use this space not only to record my experiences with the Natural Language Toolkit, but also with Python itself. Although I have extensive programming experience, I have never used python, and it is […]
So the project that I am working on is a text classification project. I am currently working with python and the natural language tool kit to build some basic text classifiers. I am trying to repeat Moretti’s […]
Hello everyone. I am a part time student in the MALS program in the Digital Humanities track. I am also a full time systems engineer at a financial company, so I am coming from a fairly technical background. I […]
The reminded me of an article I read recently about children’s books and maps (and how the best books had maps). I remember maps being much more common in children’s books than in “adult” books. Maybe we were just […]