Linux Phone for Christmas

I’m excited about my new Christmas present for myself, a linux phone. Really hope I can cut down on the distractions and manipulations by my OS and app providers. And I may soon be able to integrate nlpia-bot into it (after I finish plugging it up to mastodon.social. The decision between the Pine64 PinePhone and the Libre Phone developer edition was a no brainer for a tinkerer like me.

Read More

Linux Phone for Christmas

I’m excited about my new Christmas present for myself, a linux phone. Really hope I can cut down on the distractions and manipulations by my OS and app providers. And I may soon be able to integrate nlpia-bot into it (after I finish plugging it up to mastodon.social. The decision between the Pine64 PinePhone and the Libre Phone developer edition was a no brainer for a tinkerer like me.

Read More

Getting started with NLP

At Manceps our interns are building web and mobile apps to interface with their natural language model for unredacting the Mueller report. Here are some of the exercises they used to get up to speed on python and NLP quickly.

Read More

Nessvectors for San Diego Python User Group

I had a lot of fun playing with words at the monthly Python User group meeting in San Diego this week. Congratulations to Torin Panick @torrinp for winning a free copy of NLP in Action. For those of you that missed out, I’ll give out one free eBook code and a 42% discount code next time. And I’ll be a bit more organized about the competition ;).

Read More

Unredact the Mueller Report?

What if the latest language models from Google were so good that they could unredact the Mueller Report? We gave it a shot at the monthly Portland Python User Group for May. BERT came up with some surprising results. The slides and code are here: PDF ODP py

Read More

Word Patterns

Word patterns are what you can use to match or generate phrases. They’re usually called grammars, in math and computer science courses. But this post is about word grammars rather than character grammars. And the word grammar means something very precise to a lot of people, so I don’t want to step on any toes by using the word incorrectly. So I’ll just talk about word patterns.

Read More

Word Patterns

Word patterns are what you can use to match or generate phrases. They’re usually called grammars, in math and computer science courses. But this post is about word grammars rather than character grammars. And the word grammar means something very precise to a lot of people, so I don’t want to step on any toes by using the word incorrectly. So I’ll just talk about word patterns.

Read More

Infinite-vocabulary word embeddings

Word embeddings are at the core of the most impressive natural language models. Dialog systems, abstractive summarizers, universal sentence embeddings, question answering systems and even unsupervised knowledge extraction engines all rely on broad vocabularies of word embeddings. But even the 1M word vocabulary of Word2vec and GloVE embeddings isn’t broad enough to solve the most useful challenges for natural language processing, such as medical record summarization, or even dialog engines that can handle the ever expanding vocabulary of teenagers.

Read More

SSH Server On Office PC Behind Building's NAT Router

Say you’re leasing space in an office building for your startup and you share the network with all the other tennants. This could be a wireless router or hard-wired ethernet router. The problem is you don’t have the password for the admin page on that router. So you can’t expose a port on your server for ssh or webhosting or whatever. Normally you’d just add a port-forwarding rule on the router to send 22 and 80 and 443 all through to your server. But that might mess up somebody else using the same router to serve up their page.

Read More

Data Science Trends

Springboard Data Science Careers students keep asking me which specialization they should pursue. And they often want to know which specialization are most likely to hire a junior data science coming out of Springboard. I try to encourage my students to pursue something that they are good at, because there will always be a market for someone who is good at what they do. But if you really want to follow the crowd and go where the employers are hiring check out the AIIndex.org 2018 report. It looks like NLP was popular in 2016 and 2017 but may be overtaken by computer vision and “deep learning” by 2020. This roughly corresponds to the widespread deployment of self-driving cars, which will eventually replace apoproximately 10% of the US workforce with machines. And those driving and logistics jobs have been “transformed” into data science jobs over the past few years. So if you’re a full time Lyft Driver, now might be a good time to start taking night classes in Data Science and getting reconnected to your nerdy friends.

Read More

Nginx web server setup

Each time I have to set up a domain name service table or database access for a web server server I forget how to do it. And there doesn’t seem to be a good online guide for it. So here are my notes.

Read More

Raspberry Pi Camera Configuration

I’ll eventually figure out where I put my notes on configuring a Raspberry Pi camera for streaming video and offline object detection. But for now, check out the BerryNet repo. These guys have done it right!

Read More

Open Source Teleprompter?

I’m recording some instructional videos for a Natural Language Processing In Motion course for Manning Publishing and maybe a Data Science for Healthcare course for UCSD. I tried using Camtasia to simultaneously record the slides on one monitor and the talking head (webcam). And I tried using a Libre Office in presentation mode to show the slides/animations on my laptop screen and read from the external display (slide notes). But in display mode Libre Office puts the notes to the right in the middle of the screen and my eyes weren’t looking at the camera. Is there a better way to set up a “teleprompter” and webcam so that the top line is always near the webcam? I’d prefer open source and free or low cost.

Read More

Sentence Embedding

Sentence embeddings took off in 2017. When Google released their Universal Sentence Encoder last year researchers took notice. Google trained their sentence embedding on a massive corpus of text, everything from wikipedia and news articles to FAQs and forums. And then they refined the accuracy by training it on the Stanford Natural Language Inference corpus. Like word2vec, this enabled NLP enthusiasts to leverage Google’s text-scraping and cleaning infrastructure to build their own models using transfer learning. Transfer Learning is just a fancy way by using one model within another. Usually you’re just doing “activation” or “inference” with the pretrained model and then using its output as a feature (input) for some other model.

Read More

Hyperspace Topology Games

Play around with these geometries in your brain. Then see what happens for real when you do this with high dimensional vectors, like word vectors (Mikolov).

Read More

Abstract Hyperparameter Optimization Machines Better Than Humans

Advances in neural networks and deep learning have renewed interest in algorithms to automate the tuning of the expanding list of hyper-parameters for these high-dimensional models. Open source libraries such as scikit-learn provide ready access to simple but inefficient algorithms such as exhaustive search and random search. Recently, Snoek et al showed that statistical hyper-parameter optimization approaches produce better better results than humans and are more efficient than exhaustive or random approaches in high-dimensional domains such as image and speech machine learning.[1] Similarly, Bergstra et al. improved efficiency and performance further with their Sequential Model-Based Global Optimization (SMGO) approach which approximates the computationally complex model training step with a heuristic.[2] In this paper we will demonstrate these hyper-parameter optimization algorithms on several toy and real-world problems, including machine learning problem types not previously optimized with SMGO.

Read More

XZ 7Zip Performance on the Latest Python 3.6 Source Code Release

With the Python 3.6 release today, I noticed the source package compression extension wasn’t one I was familiar with. Turns out it’s the old 7zip format updated for ‘nix file metadata (owner, permissions, sticky bit, etc). So I played around with it to see how it performs at its maximal and extremely maximal compression levels.

Read More

Hyper-Indexing with LSHash (Locality Sensitive Hashing)

Indexing topic vectors from an LSI Model is more difficult than it seems. My first instinct was to use the 3D indexer plugin for PostgreSQL, PostGIS. After all that’s the typical example I keep in my head for indexing. You create a discrete “on or off” label for each location based on whether it is present or absent within a grid point. This allows you to efficiently find it (and any nearby points) with a query with a WHERE grid = 'A11' for a letter/int 2D indexing system that you see on old paper road maps from AAA.

Read More

Automation-Safer-Than-Manual

Interactive automation is much better than fully manual keyboard bashing for a lot of linux tasks. It’s taken decades but many linux distributions have finally made it possible to install linux automatically without too much hassel. But other mundane tasks like adding or swapping out a harddrive are a real bear. And the online instructions (especially at Canonical’s Ubuntu docs site) sound overly protective, cautious, encouraging the user to do everything by hand instead of automating things with a script. And they often get critical steps wrong, endangering your data and your computer.

Read More

Python-Birth-Microsecond-Paradox

Cole got bit by the Birthday Paradox when using python random.randint() and time.time() to generate a random number to tag a DB record with a unique ID. I think Hannes does something similar to ensure user-provided files are all unique, even a user uploads the exact same file twice.

Read More

History Temp Panic

ls /usr/local rm siteconf.p ls /usr/include/boost ls /usr/include/ sudo apt install boost sudo apt install python-boost workon hope pip3 install boost pip3 search boost exit nvcc nvcc --help history | grep BLAS exit ls -al cd src ls -al tar -xvzf boost_1_61_0.tar.gz cd boost_1_61_0/ ls -al sudo apt install python3-dev sudo apt install python3-devel sudo apt install python-dev history | grep BLAS exit history | grep BLAS sudo apt search BLAS sudo apt install libopenblas-* cd src ls -al # sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb sudo apt-get install linux-headers-$(uname -r) sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb sudo apt uninstall cuda sudo apt-get uninstall cuda sudo apt-get remove cuda ls /usr/local/ ls /usr/local/cuda ls /usr/local/cuda/README more /usr/local/cuda/README more /usr/local/cuda/version.txt more /usr/local/cuda-7.5/version.txt sudo apt-get --purge remove cuda-7.5 ls /usr/local/cuda/README ls /usr/local/cuda/ ls /usr/local/cuda-7.5/ sudo dpkg -i cuda-repo-ubuntu1504-7-5-local_7.5-18_amd64.deb sudo apt-get update sudo apt-get install cuda sudo shutdown -r now modprobe nvidia more /etc/modules sudo modprobe nvidia sudo modprobe nvidia-uvm ls /usr/local nano /etc/bash.bashrc exit nano .theanorc workon hope ipython python3 -c 'import theano; theano.test()' exit update-alternatives g++ update-alternatives --list update-alternatives --list g++ update-alternatives --list gcc gcc -V gcc -v g++ -v exit htop PW="`echo $UN | md5sum | cut -c2-7`" UN=hannes PW="`echo $UN | md5sum | cut -c2-7`" echo $UN | md5sum | cut -c2-7 echo zak | md5sum | cut -c2-7 echo cole | md5sum | cut -c2-7 echo hannes | md5sum | cut -c2-7 exit sudo apt-get install mpixx mpixx sudo apt-get install libopenmpi-dev cd src cd devops/ ls -al cd scripts/ ls nano addusers chmod +x addusers sudo ./addusers hannes cole matt erin andrew sudo ./addusers riley cd /usr/local/cuda-7.5/samples/ ls -al cd bin ls -al cd x86_64/ ls cd linux/ ls cd release/ ls cd .. cd release ./deviceQuery ./bandwidthTest sudo apt install cuda-gdb-src sudo apt search cuda-gdb-src sudo apt-get install cuda-gdb-src cd ~/src/ ls sudo apt-get update sudo apt-get install cuda-gdb sudo apt-get install cuda-command-line-tools-7-5-src sudo apt-get install cuda-command-line-tools-7-5-devel sudo apt-get install cuda-command-line-tools-7-5-dev echo -e "\n[nvcc]\nflags=-D_FORCE_INLINES\n" >> ~/.theanorc workon hope pip install Theano pip uninstall Theano pip remove Theano pip purge Theano pip cleanup pip install Theano python3 -c "import theano;" python3 -c "import theano; theano.test()" cd ../devops/ nano configure_theano chmod +x configure_theano ./configure_theano sudo apt install gcc-5.2 sudo apt install gcc-5.2.1 sudo apt install --update gcc sudo apt install --upgrade gcc which gcc ls /usr/bin/gcc /usr/bin/gcc /usr/bin/gcc -v python -c "import theano; theano.test()" nano ~/.theanorc python -c "import theano; theano.test()" pip install git+https://github.com/dnouri/nolearn.git@master#egg=nolearn==0.7.git pip install scipy cd .. cd devops/ ls mv configure_theano scripts/ cd .. sudo cp .theanorc ~hannes/ sudo su hannes sudo chown hannes:hannes ~hannes/src/* sudo chown hannes:hannes ~hannes/.theanorc sudo chown -R hannes:hannes ~hannes/src/ sudo su hannes pip install lasagna pip install Lasagna pip install https://github.com/Lasagne/Lasagne/archive/master.zip pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip sudo pip install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip deactivate exit cd src cd devops/ ls cd scripts/ ls addusers thunder sudo ./addusers thunder more addusers echo hannes | md5sum | cut -c2-7 echo thunder | md5sum | cut -c2-7 exit ls cd src ls tar -xvzf cudnn-7.5-linux-x64-v5.1-rc.tgz cd cuda ls -al export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH exit env env | grep LD nano /etc/bash.bashrc nvcc -V ls -al cd src ls -al cd cuda ls -al cd .. cd /usr/local ls -al cd cuda-7.5/ ls -al cd samples ls -al make gcc -V gcc -v sudo update-alternatives --remove-all gcc sudo update-alternatives --remove-all g++ sudo apt-get install gcc-4.9 sudo update-alternatives --remove-all g++ sudo apt-get install g++-4.9 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.9 10 sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.9 10 make sudo make cd ~/ workon hope pip install -r https://raw.githubusercontent.com/dnouri/nolearn/master/requirements.txt pip install git+https://github.com/dnouri/nolearn.git@master#egg=nolearn==0.7.git ipython pip freeze > requirements.gpu.txt more requirements.gpu.txt nano requirements.gpu.txt cp requirements.gpu.txt ../../../hannes/ sudo cp requirements.gpu.txt ../../../hannes/ sudo chmod a+r ../../../hannes/requirements.gpu.txt sudo chmod a+w ../../../hannes/requirements.gpu.txt sudo chmod a+d ../../../hannes/requirements.gpu.txt sudo chmod a+x ../../../hannes/requirements.gpu.txt sudo su hannes sudo pip3 install theano sudo pip3 install --upgrade pip sudo pip3 install --upgrade theano cd src sudo su hannes sudo pip3 install --upgrade https://github.com/Lasagne/Lasagne/archive/master.zip sudo pip3 install --upgrade https://github.com/Theano/Theano/archive/master.zip ls -al cd devops sudo nano test_theano_gpu.py python3 test_theano_gpu.py cd .. cd devops mv test_theano_gpu.py scripts/ cd .. mv cuda cuda_base tar -xvzf cudnn-7.5-linux-x64-v5.1-rc.tgz ls -al cd cuda which nvcc ls -al sudo cp -P include/cudnn.h /usr/local/cuda-7.5/include sudo cp -P lib64/libcudnn* /usr/local/cuda-7.5/lib/x86_64-linux-gnu/ sudo cp -P lib64/libcudnn* /usr/local/cuda-7.5/lib64/ sudo chmod a+r /usr/local/cuda-7.5/lib64/libcudnn* cd /usr/local/cuda-7.5/ ls -al cd ~/src/devops ls -al cd scripts/ git status ls -al sudo chmod +x test_theano_gpu.py ./test_theano_gpu.py sudo pip install twip sudo pip install pug-nlp workon hope ls -al cd hope ls -al cd hope ls -al cd .. htop w sudo su hannes python ipython write hannes "noticed that your env defaults to python2.7 mine is python 3.4.3 for all the root-install theano/nolearn stuff" sensors write hannes < `sensors` sensors | write hannes write hannes "Nothing to worry about yet, but CPU is getting hot. Type `sensors` to see status" write hannes "Nothing to worry about yet, but CPU is getting hot. Type 'sensors' to see status" write hannes "I have $2 CPU cooler/radiator from free geek without any thermal paste, etc" write hannes "So probably need to get a water-cooled radiator if you are going to run it continuously for long periods" sensors sensorsensor sensors htop sensors
Read More

Cpu Gpu Temp Sensors Log

(hope)hobs@hobs-black-gpu:~/src/hope/hope$ write hannes "noticed that your env defaults to python2.7 mine is python 3.4.3 for all the root-install theano/nolearn stuff" write: hannes is not logged in on noticed that your env defaults to python2.7 mine is python 3.4.3 for all the root-install theano/nolearn stuff (hope)hobs@hobs-black-gpu:~/src/hope/hope$ sensors coretemp-isa-0000 Adapter: ISA adapter Physical id 0: +75.0°C (high = +77.0°C, crit = +87.0°C) Core 0: +62.0°C (high = +77.0°C, crit = +87.0°C) Core 1: +69.0°C (high = +77.0°C, crit = +87.0°C) Core 2: +67.0°C (high = +77.0°C, crit = +87.0°C) Core 3: +71.0°C (high = +77.0°C, crit = +87.0°C) Core 4: +64.0°C (high = +77.0°C, crit = +87.0°C) Core 5: +67.0°C (high = +77.0°C, crit = +87.0°C) Core 6: +64.0°C (high = +77.0°C, crit = +87.0°C) Core 7: +65.0°C (high = +77.0°C, crit = +87.0°C)
Read More

PenTesting Peanut Gallery

Really enjoyed getting a crash course in InfoSec and PenTesting by Dean at the Ctrl-H HackerSpace meetup. Here’s how to get some tools for easy, ethical hacking.

Read More

Wildlife Survey and Cowboy Drone

I spend a lot of time hiking around in the snow taking pictures of animal tracks and maintaining wildlife survey cameras for Cascadia Wild. And I can’t help but daydream about Drone/Robot assistants doing a lot of this for me.

Read More

Dual Boot HP Spectre 360 Laptop

I love my new Spectre laptop with the fold-back screen. It’ll make an awesome picture frame or navigation tablet at the end of its life. But to keep it relevant I configured it for dual boot with Ubuntu. I need Windows 10 because Quick Books still hasn’t gotten with the Open Source program.

Read More

HPC on a Budget

The halfling (half-length) PCIe NVidia GeForce 970 card I ordered required 1 PCIe 3 slot, but also needs physical clearance for the connectors to poke out through 2 slots in the back of the Chassis. So form-factor planning can be a bitch. The Free Geek chassis I’m using has all the PCI slots free (including a PCIe 2), so plenty of holes int eh metal chassis, but the PCIe 2.0 slot is at the wrong end of the series, and the Nvidia card needs the blocked side for its connectors. back to New Egg she goes.

Read More

PyPi Packaging with PyScaffolding

PyScaffold (pip install PyScaffold or pyscaffold) is awesome tooling. It adds a nice putup command to your shell. The putup command creates a boiler-plate directory structure for any python project. It can even set up .tox and .travis test config files, documentation build scripts, and a django project for you, if you ask it to. And it is very git aware. The only thing I add to my git hooks is a pandoc line to translate my README.md into README.rst so that both my github-trained fingers and ReST-loving PyPi can be happy.

Read More

Neural Net Brainstorm

Cole’s class on neural nets inspired some “out of the box” thinking about how brains work and how we train neural nets. Students asked about the performance of regularization vs random dropout, and the computational bottlenecks for random dropout.

Read More

Your Own Private Cloud and NAS Drive

The Buffalo Airport Extreme is pretty expensive ($100), but when coupled with a cheap multi-TB USB 3.0 drive, it makes it pretty nice personal cloud. You can even download all of the Wikipedia and Wikimedia Commons dumps directly to the drive without passing through your precious laptop SSD. 10 Mbps rates are no problem for most USB 3.0 drives.

Read More

Getting Started with your PiBot TiddlyBot

I helped my teenage nephew get started on his kickstarter TiddlyBot Christmas Present over the holidays. We a linux laptop (Ubuntu) and recorded all the tedious setup steps so you can spend more time programming your bot and less time getting set up.

Read More

B-Machine Learning

The “B” isn’t for Bot, it’s for “Benefit”, as-in B-Corporation. What do B-Corps have to do with Machine Learning?

Read More

Inspiring Night -- John Irving Explaining his Craft

It was inspiring, almost magical, listening to John Irving explain his art, his insight into life, at Portland Art Museum. OPB hosted him with the towering church organ of the First Congregational United Church of Christ as a backdrop. John Irving’s intellect and humor eventually dwarfed the organ.

Read More

Gaussian Mixture Model

Working on this Kaggle challenge (Otto Product Categorization), it’s becoming clear that the most appropriate hard-coded model is a Bayesian Classifier. And you don’t need the “gamification” clues to tell you that. Though the clues helped. “I’m a strict Bayesian, you know” was the acknowledgment message I received last week with my first decision-tree submission (within spitting distance of the benchmark). Clever. I love Kaggle for this! For the same reason I love stack overflow… they use influence techniques for the TotalGood rather than their focusing on monetization (their own financial gain).

Read More

Model and Diagram Any Database Using SQLAlchemy

I needed to model and diagram (ERD) a client's database schema in order to understand their machine learning task. They don't use Django, so I can't just `manage.py inspectdb` and [`manage.py graph_models`](http://django-extensions.readthedocs.org/en/latest/graph_models.html). But fortunately, sqlalchemy makes both of these tasks easy.
Read More

Language Trivia

Ever wonder why capital letters have mostly straight lines, especially in Latin? Carving is much easier with straight lines. Think of all those Greek and Roman buildings and their location names carved in stone. You’d straighten all the curves too if you had to carve someone’s name into a piece of granite. Lower case letters came much later in history, once we started writing with ink.

Read More

PyCon 2015 -- Predict Weather with PyBrain, Attribution Do-Over

Here’s an attribution “do-over” for my PyCon 2015 lightning talk. I didn’t even capitalize PyBrain correctly. So here’s my belated thank you to Lynn Root for herding us Lighting Talk cats with grace, and the videographer and sound crew that pulled off this technical juggling act without once dropping a ball. And a big thanks to the PyBrain creators led by IDSIA Professor Jürgen Schmidhuber, contributors, and supporters. PyBrain is an awesome library. My talk, and work for my employer, wouldn’t have been possible without it. I can only blame my attribution FAIL on public speaking nerves and my inability to maintain a stable WiFi connection as I tried to create the slides in the seconds leading up to podium time.

Read More

PyCon 2015 -- Predict Weather With PyBrain

Here are the latest slides for a PyCon 2015 lightning talk on neural nets “Predict Weather with PyBrain”, with a little help(er) from pug-ann. Appologies if you attempted to follow along and execute the code on the slides. WiFi dropped before I could save updated slides.com reveal.js slides. So the slides didn’t reflect the latest version of pug-ann. I’ve got to start building slides locally. The typos were embarrassing. TLDR; A 6-node neural net can predict the max temperature in Portland a day in advance with about 5 deg C (10 deg F) 1-sigma error.

Read More

PDDL Parser for AI Planning

If you need to parse PDDL for the AI Planning class at coursera, check out this script. It’s pretty basic and hasn’t been tested on the DWR problem descriptions, but I’m really enjoying playing around with my first “compiler”. I’m sure I’ve done things the “wrong way”, but the pyparsing package is very intuitive and seems forgiving of my mistakes.

Read More

Another Challenge Do-Over

I failed another coding challenge and couldn’t just put it out of my mind. The challenge is this. You’re given a passage with any number of sentences and words in it, but some of the words have slashes between them instead of spaces to indicate “or”, like “The brown/black/crazy cat crossed the road.” Your objective is to parse those strings and return a list of strings with all the possible alternative interpretations of the phrases. The unspoken, unmet challenge is to then process these alternatives to be the logical interpretations that a human would make, to resolve ambiguities when the slashed words aren’t all the same part of speech and aren’t intended to be just swapped for one another. Perhaps the ambiguity is whether the slash means “or” or “and”. In the 30 minutes I had, I never got past the recursion and book-keeping of the parsing. But here’s what I came up with, complete with doctests that pass.

Read More

Automata and Machine Intelligence

More and more, the smart people I meet are talking about Automata, Natural Language Processing, and Graph Search (AI/MI Planning) all in the same breath. I’ve taken MOOCs on all 3, but think I need to revisit automata. Math proofs rely on automata to model machine intelligence. And they are at the core of understanding what is possible with AI/MI. And I’m finding some interesting connections that I missed the first time around.

Read More

Graph Search Using Networkx

I’m having fun with a traveling salesman, minimum spanning tree problem over here. Check it out for pretty graph diagrams and some cool Networkx python examples.

Read More

You're up and running!

Next you can update your site name, avatar and other options using the _config.yml file in the root of your repository (shown below).

Read More