Make sure to go and test Buddy on your own grow!

In 2018, it became easier than ever to grow your own spectacular Californian cannabis flowers. But producing beautiful buds is not without it’s challenges.

When things get out of whack, your plant has only visual cues to communicate it and learning to diagnose different stress signals takes experience. This makes a computer vision based approach to stress diagnosis natural.

This post will illustrate how to build your own data flywheel to solve a problem using our cannabis diagnosis engine, Buddy, as an example.

Gathering Data

Besides gathering our own images, we begin with what can be found using google image search. After reviewing a few dozen photos, we can refine our search queries to get the best results.

We use the Firefox plugin Google Image Downloader to retrieve several hundred examples. We will manually filter out irrelevant or malformed images.

Some disease or stress categories are more difficult to find example data. Downloading and splitting youtube videos, then subsampling the resulting frames can provide additional training images. Use this gist to download a list of youtube videos based on a query. After you've split videos, you can use our video editing tool to curate useful distinct images.

When you look through the ImageNet corpus, you will find numerous links to cannabis photos. With these links as base urls, we can web scrape for more training data. Try out scrapy to gather image links and metadata - using good etiquette.

Here's an example on how to use the python requests library to extract image urls. First, take a look at the html of your target site. Using Firefox, you can right click the targeted element selecting Inspect Element. It will appear highlighted in the code shown in the inspector below.

how to webscrape

Now use the lxml library to gather all the images on the page. We recommend you familiarize yourself with xpath to extract any kind of information from the html code.

import requests
from lxml import html
from urllib.request import urlretrieve

url = ""  #url of site, add your own here
page = requests.get(url)
#generate a tree of the html to traverse
tree = html.fromstring(page.content)

#creating a list of links for images 
image_list = tree.xpath('//p[@class="rtecenter"]//a//img/@src')  #all images we want have this referencing syntax
#now download all the links and store to a directory!
for idx, link in enumerate(image_list):
    urlretrieve(link, "/path/to/imgs/{}.jpg".format(idx))

Fine Tuning a Classifier

The code in Tensorflow for Poets is one of the simplest ways to start training an image classifier. This uses fine-tuning to specialize a model which performs generic image classification into one adapted to your application.

To start, separate images into a directory structure named according to category. You may chose to augment some categories to achieve greater balance by randomly transforming renditions of your samples. Imgaug offers a rich toolkit for this purpose.

Check out the Keras blog for an overview of fine-tuning your image classifier on limited data or the jupyter notebooks associated with “Hands on Machine Learning with Scikit-Learn and TensorFlow".

Serve your AI application

With a lightweight model, we can perform fast inference under a web server to help others benefit from the wisdom of cannabis diagnosis distilled into our AI application.

Flask is a simple, python-based web application framework that we use along with a web server like Apache or Nginx to serve Buddy model results.

Now that you have a useful model, you'll want to add an intuitive, easy-to-use front end. Keep your design simple and easy to understand. For example, we added some instructions on how to submit photos and added a dropify image upload box for users to upload an image. After submitting the form, the user receives the top three predictions from our model.

buddy gif

Buddy on the Go with Android

If you're using tensorflow-for-poets, you may find the sequel a worthwhile starting point for deploying your model on mobile devices running Android.

buddy gif

Buddy More Generally

The value of a machine learning algorithm relates to its performance on data not seen during training. Because the ImageNet dataset includes many 'in-the-wild' samples from 1000 categories, image classifiers fine-tuned using this dataset encode a good deal of knowledge about our world. Therefore, it is no surprise that our models should treat photos of plant vegetation with some equivalence. Although our models only used cannabis photos to learn indicators of nutrient deficiencies or environmental stressors, we find Buddy performs quite well in identifying these indicators in other plant species. This highlights the generalizability of our models to perform inference, and thus the quality of Buddy's results.

generalize examples

Scaling Up Buddy Inference

Some users, particularly those with large-scale commercial applications in mind, will prefer a simple UI for analyzing many images at once.

This makes an API a natural choice so that users may send a dump of images for analysis. We allow users to submit images via a RESTful API, for example:

curl -X POST -F "file=@path/to/img.jpg"

APIs for high throughput crop analysis can offer suggestions based on an aggregate view of the findings to help growers match feeding/growing protocols with the ideal environmental requirements for each strain.

Bootstrapping Buddy

Now that we can serve model results to Kindbot Buddy users, we would like to use new images to bootstrap our classifier on data not seen during training. However, for Buddy to scale to many new users, we must avoid manually labeling all the new images.

Assuming the initial model performs well, we can simply treat the inference results as ground truth. However, errors in this approach may eventually degrade the quality of new models.

If we sort inference results by class probabilities, we can use our initial model as a filter to reduce the workload for human reviewers in making the final determination on class labels.

The Buddy Mission

Using Buddy is fast and frictionless and the deep learning model performance scales well with the number of training samples. This makes Buddy an ideal vessel for learning to say something useful about cannabis photos.

To continue improving your model, save the images users submit to your application, add them to your training corpus, and periodically retrain.