Slides -- Bioinformatics FAQ Bot

Bioinformatics FAQ Bot

Slides -- Bioinformatics FAQ Bot

FAQ Bot

Slides -- Bioinformatics FAQ Bot

Bioinformatics FAQ Bot

Holiday Party Vegan Risotto

Vegan Risotto

Linux Phone for Christmas

I’m excited about my new Christmas present for myself, a linux phone. Really hope I can cut down on the distractions and manipulations by my OS and app providers. And I may soon be able to integrate nlpia-bot into it (after I finish plugging it up to mastodon.social. The decision between the Pine64 PinePhone and the Libre Phone developer edition was a no brainer for a tinkerer like me.

Linux Phone for Christmas

I’m excited about my new Christmas present for myself, a linux phone. Really hope I can cut down on the distractions and manipulations by my OS and app providers. And I may soon be able to integrate nlpia-bot into it (after I finish plugging it up to mastodon.social. The decision between the Pine64 PinePhone and the Libre Phone developer edition was a no brainer for a tinkerer like me.

Turning Points

I’m listening to the audiobook “Upheaval: Turning Points for Nations in Crisis” by Jared Diamond. Diamond talks at length about the decisions around turning points in his own life and their parallels with those changes in governments and societies. And Fall is when I start thinking about turning points in my life.

Slides -- A Smarter Chatbot

A Smarter Chatbot

nlpia-bot

Had a great time with Austin and Xavier mashing up Parul Pandey’s question-answering chatbot with nlpia-bot. She made it super simple.

Getting started with NLP

At Manceps our interns are building web and mobile apps to interface with their natural language model for unredacting the Mueller report. Here are some of the exercises they used to get up to speed on python and NLP quickly.

Docvectors using spaCy for Springboard

One of my Springboard mentees asked how she should compute document vectors using the word2vec vectors available within a parsed document object from the spaCy parser.

Nessvectors for San Diego Python User Group

I had a lot of fun playing with words at the monthly Python User group meeting in San Diego this week. Congratulations to Torin Panick @torrinp for winning a free copy of NLP in Action. For those of you that missed out, I’ll give out one free eBook code and a 42% discount code next time. And I’ll be a bit more organized about the competition ;).

Unredact the Mueller Report?

What if the latest language models from Google were so good that they could unredact the Mueller Report? We gave it a shot at the monthly Portland Python User Group for May. BERT came up with some surprising results. The slides and code are here: PDF ODP py

Word Patterns

Word patterns are what you can use to match or generate phrases. They’re usually called grammars, in math and computer science courses. But this post is about word grammars rather than character grammars. And the word grammar means something very precise to a lot of people, so I don’t want to step on any toes by using the word incorrectly. So I’ll just talk about word patterns.

Word Patterns

Word patterns are what you can use to match or generate phrases. They’re usually called grammars, in math and computer science courses. But this post is about word grammars rather than character grammars. And the word grammar means something very precise to a lot of people, so I don’t want to step on any toes by using the word incorrectly. So I’ll just talk about word patterns.

Infinite-vocabulary word embeddings

Word embeddings are at the core of the most impressive natural language models. Dialog systems, abstractive summarizers, universal sentence embeddings, question answering systems and even unsupervised knowledge extraction engines all rely on broad vocabularies of word embeddings. But even the 1M word vocabulary of Word2vec and GloVE embeddings isn’t broad enough to solve the most useful challenges for natural language processing, such as medical record summarization, or even dialog engines that can handle the ever expanding vocabulary of teenagers.

SSH Server On Office PC Behind Building's NAT Router

Say you’re leasing space in an office building for your startup and you share the network with all the other tennants. This could be a wireless router or hard-wired ethernet router. The problem is you don’t have the password for the admin page on that router. So you can’t expose a port on your server for ssh or webhosting or whatever. Normally you’d just add a port-forwarding rule on the router to send 22 and 80 and 443 all through to your server. But that might mess up somebody else using the same router to serve up their page.

Data Science Trends

Springboard Data Science Careers students keep asking me which specialization they should pursue. And they often want to know which specialization are most likely to hire a junior data science coming out of Springboard. I try to encourage my students to pursue something that they are good at, because there will always be a market for someone who is good at what they do. But if you really want to follow the crowd and go where the employers are hiring check out the AIIndex.org 2018 report. It looks like NLP was popular in 2016 and 2017 but may be overtaken by computer vision and “deep learning” by 2020. This roughly corresponds to the widespread deployment of self-driving cars, which will eventually replace apoproximately 10% of the US workforce with machines. And those driving and logistics jobs have been “transformed” into data science jobs over the past few years. So if you’re a full time Lyft Driver, now might be a good time to start taking night classes in Data Science and getting reconnected to your nerdy friends.

Nginx web server setup

Each time I have to set up a domain name service table or database access for a web server server I forget how to do it. And there doesn’t seem to be a good online guide for it. So here are my notes.

Raspberry Pi Camera Configuration

I’ll eventually figure out where I put my notes on configuring a Raspberry Pi camera for streaming video and offline object detection. But for now, check out the BerryNet repo. These guys have done it right!

Open Source Teleprompter?

I’m recording some instructional videos for a Natural Language Processing In Motion course for Manning Publishing and maybe a Data Science for Healthcare course for UCSD. I tried using Camtasia to simultaneously record the slides on one monitor and the talking head (webcam). And I tried using a Libre Office in presentation mode to show the slides/animations on my laptop screen and read from the external display (slide notes). But in display mode Libre Office puts the notes to the right in the middle of the screen and my eyes weren’t looking at the camera. Is there a better way to set up a “teleprompter” and webcam so that the top line is always near the webcam? I’d prefer open source and free or low cost.

Poetix

A big thanks to Philip R. Baldwin for sharing this clever AI-generated sonnet.

Sentence Embedding

Sentence embeddings took off in 2017. When Google released their Universal Sentence Encoder last year researchers took notice. Google trained their sentence embedding on a massive corpus of text, everything from wikipedia and news articles to FAQs and forums. And then they refined the accuracy by training it on the Stanford Natural Language Inference corpus. Like word2vec, this enabled NLP enthusiasts to leverage Google’s text-scraping and cleaning infrastructure to build their own models using transfer learning. Transfer Learning is just a fancy way by using one model within another. Usually you’re just doing “activation” or “inference” with the pretrained model and then using its output as a feature (input) for some other model.

NLP Word Usage Trends

Here are some example Google N-gram Viewer queries that I used while researching the NLP in Action book, including one to decide how to spell “n-gram” ;)

NLP Hacks for Writers

Default to Open

You Decide

Hyperspace Topology Games

Play around with these geometries in your brain. Then see what happens for real when you do this with high dimensional vectors, like word vectors (Mikolov).

git

Git

Abstract Hyperparameter Optimization Machines Better Than Humans

Advances in neural networks and deep learning have renewed interest in algorithms to automate the tuning of the expanding list of hyper-parameters for these high-dimensional models. Open source libraries such as scikit-learn provide ready access to simple but inefficient algorithms such as exhaustive search and random search. Recently, Snoek et al showed that statistical hyper-parameter optimization approaches produce better better results than humans and are more efficient than exhaustive or random approaches in high-dimensional domains such as image and speech machine learning.[1] Similarly, Bergstra et al. improved efficiency and performance further with their Sequential Model-Based Global Optimization (SMGO) approach which approximates the computationally complex model training step with a heuristic.[2] In this paper we will demonstrate these hyper-parameter optimization algorithms on several toy and real-world problems, including machine learning problem types not previously optimized with SMGO.

Gluten-Free Antioxidant Oatmeal Cookies

Gluten-Free Oatmeal Cookies

XZ 7Zip Performance on the Latest Python 3.6 Source Code Release

With the Python 3.6 release today, I noticed the source package compression extension wasn’t one I was familiar with. Turns out it’s the old 7zip format updated for ‘nix file metadata (owner, permissions, sticky bit, etc). So I played around with it to see how it performs at its maximal and extremely maximal compression levels.

Hyper-Indexing with LSHash (Locality Sensitive Hashing)

Indexing topic vectors from an LSI Model is more difficult than it seems. My first instinct was to use the 3D indexer plugin for PostgreSQL, PostGIS. After all that’s the typical example I keep in my head for indexing. You create a discrete “on or off” label for each location based on whether it is present or absent within a grid point. This allows you to efficiently find it (and any nearby points) with a query with a WHERE grid = 'A11' for a letter/int 2D indexing system that you see on old paper road maps from AAA.

PyDX is Awesome!

Watched a lot of great Python talks at PyDX this weekend. Here are some memorable ones:

Automation-Safer-Than-Manual

Interactive automation is much better than fully manual keyboard bashing for a lot of linux tasks. It’s taken decades but many linux distributions have finally made it possible to install linux automatically without too much hassel. But other mundane tasks like adding or swapping out a harddrive are a real bear. And the online instructions (especially at Canonical’s Ubuntu docs site) sound overly protective, cautious, encouraging the user to do everything by hand instead of automating things with a script. And they often get critical steps wrong, endangering your data and your computer.

Python-Birth-Microsecond-Paradox

Cole got bit by the Birthday Paradox when using python random.randint() and time.time() to generate a random number to tag a DB record with a unique ID. I think Hannes does something similar to ensure user-provided files are all unique, even a user uploads the exact same file twice.

Now THAT's Open Data -- The Google NGram Viewer Corpus

Now THAT’s Open Data!

Comparison of Hybrid Mobile App Javascript Frameworks

Zak and I are building frontends for DRF that we’d like to go mobile with eventually.

History Temp Panic

ls /usr/local rm siteconf.p ls /usr/include/boost ls /usr/include/ sudo apt install boost sudo apt install python-boost workon hope pip3 install boost pip3 search boost exit nvcc nvcc --help history | grep BLAS exit ls -al cd src ls -al tar -xvzf boost_1_61_0.tar.gz cd boost_1_61_0/ ls -al sudo apt install python3-dev sudo apt install python3-devel sudo apt install python-dev history | grep BLAS exit history | grep BLAS sudo apt search BLAS sudo apt install libopenblas-* cd src ls -al # sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb sudo apt-get install linux-headers-$(uname -r) sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb sudo apt uninstall cuda sudo apt-get uninstall cuda sudo apt-get remove cuda ls /usr/local/ ls /usr/local/cuda ls /usr/local/cuda/README more /usr/local/cuda/README more /usr/local/cuda/version.txt more /usr/local/cuda-7.5/version.txt sudo apt-get --purge remove cuda-7.5 ls /usr/local/cuda/README ls /usr/local/cuda/ ls /usr/local/cuda-7.5/ sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb sudo apt-get update sudo apt-get install cuda sudo shutdown -r now modprobe nvidia more /etc/modules sudo modprobe nvidia sudo modprobe nvidia-uvm ls /usr/local nano /etc/bash.bashrc exit nano .theanorc workon hope ipython python3 -c 'import theano; theano.test()' exit update-alternatives g++ update-alternatives --list update-alternatives --list g++ update-alternatives --list gcc gcc -V gcc -v g++ -v exit htop PW="`echo $UN | md5sum | cut -c2-7`" UN=hannes PW="`echo $UN | md5sum | cut -c2-7`" echo $UN | md5sum | cut -c2-7 echo zak | md5sum | cut -c2-7 echo cole | md5sum | cut -c2-7 echo hannes | md5sum | cut -c2-7 exit sudo apt-get install mpixx mpixx sudo apt-get install libopenmpi-dev cd src cd devops/ ls -al cd scripts/ ls nano addusers chmod +x addusers sudo ./addusers hannes cole matt erin andrew sudo ./addusers riley cd /usr/local/cuda-7.5/samples/ ls -al cd bin ls -al cd x86_64/ ls cd linux/ ls cd release/ ls cd .. cd release ./deviceQuery ./bandwidthTest sudo apt install cuda-gdb-src sudo apt search cuda-gdb-src sudo apt-get install cuda-gdb-src cd ~/src/ ls sudo apt-get update sudo apt-get install cuda-gdb sudo apt-get install cuda-command-line-tools-7-5-src sudo apt-get install cuda-command-line-tools-7-5-devel sudo apt-get install cuda-command-line-tools-7-5-dev echo -e "\n[nvcc]\nflags=-D_FORCE_INLINES\n" >> ~/.theanorc workon hope pip install Theano pip uninstall Theano pip remove Theano pip purge Theano pip cleanup pip install Theano python3 -c "import theano;" python3 -c "import theano; theano.test()" cd ../devops/ nano configure_theano chmod +x configure_theano ./configure_theano sudo apt install gcc-5.2 sudo apt install gcc-5.2.1 sudo apt install --update gcc sudo apt install --upgrade gcc which gcc ls /usr/bin/gcc /usr/bin/gcc /usr/bin/gcc -v python -c "import theano; theano.test()" nano ~/.theanorc python -c "import theano; theano.test()" pip install git+https://github.com/dnouri/nolearn.git@master#egg=nolearn==0.7.git pip install scipy cd .. cd devops/ ls mv configure_theano scripts/ cd .. sudo cp .theanorc ~hannes/ sudo su hannes sudo chown hannes:hannes ~hannes/src/* sudo chown hannes:hannes ~hannes/.theanorc sudo chown -R hannes:hannes ~hannes/src/ sudo su hannes pip install lasagna pip install Lasagna pip install https://github.com/Lasagne/Lasagne/archive/master.zip pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip sudo pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip deactivate exit cd src cd devops/ ls cd scripts/ ls addusers thunder sudo ./addusers thunder more addusers echo hannes | md5sum | cut -c2-7 echo thunder | md5sum | cut -c2-7 exit ls cd src ls tar -xvzf cudnn-7.5-linux-x64-v5.1-rc.tgz cd cuda ls -al export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH exit env env | grep LD nano /etc/bash.bashrc nvcc -V ls -al cd src ls -al cd cuda ls -al cd .. cd /usr/local ls -al cd cuda-7.5/ ls -al cd samples ls -al make gcc -V gcc -v sudo update-alternatives --remove-all gcc sudo update-alternatives --remove-all g++ sudo apt-get install gcc-4.9 sudo update-alternatives --remove-all g++ sudo apt-get install g++-4.9 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 10 sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 10 make sudo make cd ~/ workon hope pip install -r https://raw.githubusercontent.com/dnouri/nolearn/master/requirements.txt pip install git+https://github.com/dnouri/nolearn.git@master#egg=nolearn==0.7.git ipython pip freeze > requirements.gpu.txt more requirements.gpu.txt nano requirements.gpu.txt cp requirements.gpu.txt ../../../hannes/ sudo cp requirements.gpu.txt ../../../hannes/ sudo chmod a+r ../../../hannes/requirements.gpu.txt sudo chmod a+w ../../../hannes/requirements.gpu.txt sudo chmod a+d ../../../hannes/requirements.gpu.txt sudo chmod a+x ../../../hannes/requirements.gpu.txt sudo su hannes sudo pip3 install theano sudo pip3 install --upgrade pip sudo pip3 install --upgrade theano cd src sudo su hannes sudo pip3 install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip sudo pip3 install --upgrade https://github.com/Theano/Theano/archive/master.zip ls -al cd devops sudo nano test_theano_gpu.py python3 test_theano_gpu.py cd .. cd devops mv test_theano_gpu.py scripts/ cd .. mv cuda cuda_base tar -xvzf cudnn-7.5-linux-x64-v5.1-rc.tgz ls -al cd cuda which nvcc ls -al sudo cp -P include/cudnn.h /usr/local/cuda-7.5/include sudo cp -P lib64/libcudnn* /usr/local/cuda-7.5/lib/x86_64-linux-gnu/ sudo cp -P lib64/libcudnn* /usr/local/cuda-7.5/lib64/ sudo chmod a+r /usr/local/cuda-7.5/lib64/libcudnn* cd /usr/local/cuda-7.5/ ls -al cd ~/src/devops ls -al cd scripts/ git status ls -al sudo chmod +x test_theano_gpu.py ./test_theano_gpu.py sudo pip install twip sudo pip install pug-nlp workon hope ls -al cd hope ls -al cd hope ls -al cd .. htop w sudo su hannes python ipython write hannes "noticed that your env defaults to python2.7 mine is python 3.4.3 for all the root-install theano/nolearn stuff" sensors write hannes < `sensors` sensors | write hannes write hannes "Nothing to worry about yet, but CPU is getting hot. Type `sensors` to see status" write hannes "Nothing to worry about yet, but CPU is getting hot. Type 'sensors' to see status" write hannes "I have $2 CPU cooler/radiator from free geek without any thermal paste, etc" write hannes "So probably need to get a water-cooled radiator if you are going to run it continuously for long periods" sensors sensorsensor sensors htop sensors

Cpu Gpu Temp Sensors Log

(hope)hobs@hobs-black-gpu:~/src/hope/hope$ write hannes "noticed that your env defaults to python2.7 mine is python 3.4.3 for all the root-install theano/nolearn stuff" write: hannes is not logged in on noticed that your env defaults to python2.7 mine is python 3.4.3 for all the root-install theano/nolearn stuff (hope)hobs@hobs-black-gpu:~/src/hope/hope$ sensors coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +75.0°C (high = +77.0°C, crit = +87.0°C) Core 0: +62.0°C (high = +77.0°C, crit = +87.0°C) Core 1: +69.0°C (high = +77.0°C, crit = +87.0°C) Core 2: +67.0°C (high = +77.0°C, crit = +87.0°C) Core 3: +71.0°C (high = +77.0°C, crit = +87.0°C) Core 4: +64.0°C (high = +77.0°C, crit = +87.0°C) Core 5: +67.0°C (high = +77.0°C, crit = +87.0°C) Core 6: +64.0°C (high = +77.0°C, crit = +87.0°C) Core 7: +65.0°C (high = +77.0°C, crit = +87.0°C)

PenTesting Peanut Gallery

Really enjoyed getting a crash course in InfoSec and PenTesting by Dean at the Ctrl-H HackerSpace meetup. Here’s how to get some tools for easy, ethical hacking.

Wildlife Survey and Cowboy Drone

I spend a lot of time hiking around in the snow taking pictures of animal tracks and maintaining wildlife survey cameras for Cascadia Wild. And I can’t help but daydream about Drone/Robot assistants doing a lot of this for me.

Upgrading 14.04 to 15.10 on a Dual-Boot HP Spectre Laptop

Tune Down the Trackpad

Dual Boot HP Spectre 360 Laptop

I love my new Spectre laptop with the fold-back screen. It’ll make an awesome picture frame or navigation tablet at the end of its life. But to keep it relevant I configured it for dual boot with Ubuntu. I need Windows 10 because Quick Books still hasn’t gotten with the Open Source program.

HPC on a Budget

The halfling (half-length) PCIe NVidia GeForce 970 card I ordered required 1 PCIe 3 slot, but also needs physical clearance for the connectors to poke out through 2 slots in the back of the Chassis. So form-factor planning can be a bitch. The Free Geek chassis I’m using has all the PCI slots free (including a PCIe 2), so plenty of holes int eh metal chassis, but the PCIe 2.0 slot is at the wrong end of the series, and the Nvidia card needs the blocked side for its connectors. back to New Egg she goes.

Review

PyPi Packaging with PyScaffolding

PyScaffold (pip install PyScaffold or pyscaffold) is awesome tooling. It adds a nice putup command to your shell. The putup command creates a boiler-plate directory structure for any python project. It can even set up .tox and .travis test config files, documentation build scripts, and a django project for you, if you ask it to. And it is very git aware. The only thing I add to my git hooks is a pandoc line to translate my README.md into README.rst so that both my github-trained fingers and ReST-loving PyPi can be happy.

Rabbit Hole of Automation

I got carried away with automating my development process when I discovered this pre-commit hook that makes sure your python imports are sorted, like Two Scoops recommends. I noticed a hooks.yaml file that revealed that FalconSocial’s hook is actually a plugin for Yelp’s awesome pre-commit framework.

Neural Net Brainstorm

Cole’s class on neural nets inspired some “out of the box” thinking about how brains work and how we train neural nets. Students asked about the performance of regularization vs random dropout, and the computational bottlenecks for random dropout.

Smaller than Baby Steps with Julia

Julia has some impressive performance stats, so I gave it a whirl, or half a whirl.

HUML Day 4 -- Natural Language Processing

Finally Rolling

git

Git

Machine Learning Introduction

Hack University Machine Learning Introduction

Your Own Private Cloud and NAS Drive

The Buffalo Airport Extreme is pretty expensive ($100), but when coupled with a cheap multi-TB USB 3.0 drive, it makes it pretty nice personal cloud. You can even download all of the Wikipedia and Wikimedia Commons dumps directly to the drive without passing through your precious laptop SSD. 10 Mbps rates are no problem for most USB 3.0 drives.

Getting Started with your PiBot TiddlyBot

I helped my teenage nephew get started on his kickstarter TiddlyBot Christmas Present over the holidays. We a linux laptop (Ubuntu) and recorded all the tedious setup steps so you can spend more time programming your bot and less time getting set up.

B-Machine Learning

The “B” isn’t for Bot, it’s for “Benefit”, as-in B-Corporation. What do B-Corps have to do with Machine Learning?

Inspiring Night -- John Irving Explaining his Craft

It was inspiring, almost magical, listening to John Irving explain his art, his insight into life, at Portland Art Museum. OPB hosted him with the towering church organ of the First Congregational United Church of Christ as a backdrop. John Irving’s intellect and humor eventually dwarfed the organ.

Hacking Oregon's Hidden Political Connections

Hacking Oregon’s Hidden Political Connections

Notes from Data User Group Meetup -- Text Mining Meets Neural Nets

Here are my notes from the Data User Group and PDX Data Engineering Meetup presentation titled “Text Mining Meets Neural Nets: Mining the Bio-medical Literature”, presented by Dan Sullivan, the enterprise architect for Cambria Health and Ph.D. student at Virginia Tech (the Biomed Institute).

TFNW BYOB

AI Solves Problems for Which there are No Known Efficient Solutions

I’m not a huge fan of the “Daisy AI Podcast” but he often rattles off a lot of interesting information quickly, like in his 2013 podcast.

Awesome-Data-Mining-Introduction

I loved this blog post by Raymond Li that Aleck forwarded tonight: Top 10 Data Mining Algorithms. It’s approachable even by people who’ve never used any of these tools. And yet it’s so rich with information that I learned about some new techniques I’d never heard of and it cleared up some misconceptions I had about some algorithms (SVMs in particular).

Neural Nets Demystified

Draft of Neural Nets Demystified

Neural Nets Demystified

Purchasing Electronics with BitCoin

The “withdrawal” option on Kraken worked well when I used it to purchase a “refurbished” Brother laser printer on NewEgg. All you need to do is

Gaussian Mixture Model

Working on this Kaggle challenge (Otto Product Categorization), it’s becoming clear that the most appropriate hard-coded model is a Bayesian Classifier. And you don’t need the “gamification” clues to tell you that. Though the clues helped. “I’m a strict Bayesian, you know” was the acknowledgment message I received last week with my first decision-tree submission (within spitting distance of the benchmark). Clever. I love Kaggle for this! For the same reason I love stack overflow… they use influence techniques for the TotalGood rather than their focusing on monetization (their own financial gain).

Connect Mac WiFi with Comcast Motorola Surfboard Extreme SBG121 or SBG6580

Larissa and house guests are often complaining about sluggish Internet with our Comcast Motorola Router and Modem. So I tried a lot of things. In the end, I think it was the “IP Flooding” filter that was gumming up the works.

Data Science Group Talk -- Neural Nets Demystified

Portia Burton asked me to speak about Neural Nets at the next Data Science Group meetup. So here’s the abstract…

Dev Resources

Keeping Up

Soul Food

Curry Chicken Sandwiches

Model and Diagram Any Database Using SQLAlchemy

I needed to model and diagram (ERD) a client's database schema in order to understand their machine learning task. They don't use Django, so I can't just `manage.py inspectdb` and [`manage.py graph_models`](http://django-extensions.readthedocs.org/en/latest/graph_models.html). But fortunately, sqlalchemy makes both of these tasks easy.

Model and Diagram Any Database Using SQLAlchemy

I needed to model and diagram (ERD) a client’s database schema in order to understand their machine learning task. They don’t use Django, so I can’t just manage.py inspectdb and manage.py graph_models. But fortunately, sqlalchemy makes both of these tasks easy.

Graph Theory Basics, and Speech Recognition with Neural Nets

Here are the highlights from this week’s “Talking Machines” podcast from @tlkngmchns. Thank you Thunder for turning me on to this awesome podcast!

Language Trivia

Ever wonder why capital letters have mostly straight lines, especially in Latin? Carving is much easier with straight lines. Think of all those Greek and Roman buildings and their location names carved in stone. You’d straighten all the curves too if you had to carve someone’s name into a piece of granite. Lower case letters came much later in history, once we started writing with ink.

Install Mongo DB on Fedora 20 for the Ubiqity UniFi Access Point

Chick swears by his new Ubiquity WiFi access point. So I purchased the High Power version from Amazon using Prime and it arrived in only 36 hours on a Saturday! Maybe having the Ubiqity HQ here in Portland helped.

PyCon 2015 -- Predict Weather with PyBrain, Attribution Do-Over

Here’s an attribution “do-over” for my PyCon 2015 lightning talk. I didn’t even capitalize PyBrain correctly. So here’s my belated thank you to Lynn Root for herding us Lighting Talk cats with grace, and the videographer and sound crew that pulled off this technical juggling act without once dropping a ball. And a big thanks to the PyBrain creators led by IDSIA Professor Jürgen Schmidhuber, contributors, and supporters. PyBrain is an awesome library. My talk, and work for my employer, wouldn’t have been possible without it. I can only blame my attribution FAIL on public speaking nerves and my inability to maintain a stable WiFi connection as I tried to create the slides in the seconds leading up to podium time.

PyCon 2015 -- Predict Weather With PyBrain

Here are the latest slides for a PyCon 2015 lightning talk on neural nets “Predict Weather with PyBrain”, with a little help(er) from pug-ann. Appologies if you attempted to follow along and execute the code on the slides. WiFi dropped before I could save updated slides.com reveal.js slides. So the slides didn’t reflect the latest version of pug-ann. I’ve got to start building slides locally. The typos were embarrassing. TLDR; A 6-node neural net can predict the max temperature in Portland a day in advance with about 5 deg C (10 deg F) 1-sigma error.

Cleanup of Artificial Neural Net Subpackage (Module) for PUG

For February’s Python User Group I did a lightning talk and live demo using pybrain to predict the weather. It took a whole weekend to pay off the code quality debt from the hacking I was doing during Kyle Gorman’s awesome NLP talk.

PDDL Parser for AI Planning

If you need to parse PDDL for the AI Planning class at coursera, check out this script. It’s pretty basic and hasn’t been tested on the DWR problem descriptions, but I’m really enjoying playing around with my first “compiler”. I’m sure I’ve done things the “wrong way”, but the pyparsing package is very intuitive and seems forgiving of my mistakes.

Picture-in-Picture Talking Head Presentations

Once you have your reveal.js slides and live CYOA voting set up (see previous blog posts), now you need to record both your computer screen with the slides and a video of your talking head. This is how I did it for the “Creative Challenge” assignment in the coursera “AI Planning” class.

Predictive Analytics War Stories Video

Thank you David Barton and Innovation Enterprise for recording my presentation at the Predictive Analytics Summit in San Diego. It really knocked down my ego a notch to see my awkwardness. You’ve motivated me to practice.

Predictive Analytics War Stories

Reveal.js and slides.com enable remote-controlled presentations like this one at #PASanDiego. The dynamic voting slide has to be hosted separately, though, because the iframe doesn’t seem to refresh regularly.

Predictive Analytics Innovation Summit highlights

Clement Farabet, Twitter, presented some awesome demonstrations of image clustering using an open source Deep Learning library, Torch7. This is definitely my favorite talk so far at #PASanDiego

Predictive Analytics Innovation Summit highlights

The first day at #PASanDiego organized by @IE_analytics has been interesting. I haven’t heard a lot of controversial insights, it’s been useful nontheless.

Transparent Histograms

Spent a lot of this week working on prettifying bar charts, histograms and animations for some reveal.js slides.

Another Challenge Do-Over

I failed another coding challenge and couldn’t just put it out of my mind. The challenge is this. You’re given a passage with any number of sentences and words in it, but some of the words have slashes between them instead of spaces to indicate “or”, like “The brown/black/crazy cat crossed the road.” Your objective is to parse those strings and return a list of strings with all the possible alternative interpretations of the phrases. The unspoken, unmet challenge is to then process these alternatives to be the logical interpretations that a human would make, to resolve ambiguities when the slashed words aren’t all the same part of speech and aren’t intended to be just swapped for one another. Perhaps the ambiguity is whether the slash means “or” or “and”. In the 30 minutes I had, I never got past the recursion and book-keeping of the parsing. But here’s what I came up with, complete with doctests that pass.

Automata and Machine Intelligence

More and more, the smart people I meet are talking about Automata, Natural Language Processing, and Graph Search (AI/MI Planning) all in the same breath. I’ve taken MOOCs on all 3, but think I need to revisit automata. Math proofs rely on automata to model machine intelligence. And they are at the core of understanding what is possible with AI/MI. And I’m finding some interesting connections that I missed the first time around.

Love Python? Interested in NLP?

I gave an introduction to Natural Language Processing with python at the PDX python user group and showed how to use two of Bostock’s awesome graph optimization and visualization tools in his D3 library. Here’s a screenshot of one of my favorites:

Graph Search Using Networkx

I’m having fun with a traveling salesman, minimum spanning tree problem over here. Check it out for pretty graph diagrams and some cool Networkx python examples.

Artificial Neural Nets for Prediction with Python (pybrain)

I’ve forked the pybrain package and started to hobsonify it to suit my tastes, make it more pythonic, and correct some documentation errors that render some shortcuts unusable.

Finally a Decent Open Source Blog Framework

I’m loving this Jekyll thing. You won’t see many pull requests from me, but this thing sure is an efficient blogging tool.

You're up and running!

Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).