Is there sample data available to go along with the Bad Data Handbook?

I'm looking at the Bad Data Handbook via Safari Books and was wondering if the data from the examples used is available on a site somewhere?

  • Hello Susan,

    Thanks for your interest in the Bad Data Handbook. It doesn't appear as if there is a dedicated site for the data used in the book, but I did find this at the end of chapter 6 on page 93:


    All of my examples have used NLTK, Python’s Natural Language ToolKit, which you
    can find at I also train all my models using the scripts I created in nltktrainer at To learn how to do text classification and sentiment analysis with NLTK yourself, I wrote a series of posts on my blog, starting with And for those who want to go beyond basic text classification,take a look at scikit-learn, which is implementing all the latest and greatest machine learning algorithms in Python: For Java people, there is Apache’s OpenNLP project at, and a commercial library called LingPipe, available at"

    The author also has provided a link to his website on which can be found here: where you can contact him for this information if it is available.

