Ridiculously Simple ML Model Serving
Ridiculously simple model serving.
kelnerd -m SAVED_MODEL_FILE
kelner
$ pip install kelner
$ wget https://storage.googleapis.com/download.tensorflow.org/models/inception_dec_2015.zip
$ unzip inception_dec_2015.zip
Archive: inception_dec_2015.zip
inflating: imagenet_comp_graph_label_strings.txt
inflating: LICENSE
inflating: tensorflow_inception_graph.pb
$ kelnerd -m tensorflow_inception_graph.pb --engine tensorflow --input-node ExpandDims --output-node softmax
$ kelnerd -m tensorflow_inception_graph.pb --engine tensorflow --input-node ExpandDims --output-node softmax
$ curl --data-binary "@dog.jpg" localhost:61453 -X POST -H "Content-Type: image/jpeg"
The response should be a JSON-encoded array of floating point numbers.
For a fancy client (not really necessary, but useful) you can use the kelner
command.
This is how you get the top 5 labels from the server you ran above (note the head -n 5
part):
$ kelner classify dog.jpg --imagenet-labels --top 5
boxer: 0.973630
Saint Bernard: 0.001821
bull mastiff: 0.000624
Boston bull: 0.000486
Greater Swiss Mountain dog: 0.000377
Machine learning researchers who don’t want to deal with building a web server for every model they export.
Kelner loads a saved Keras or Tensorflow model and starts an HTTP server that pipes POST request body to the model and returns JSON-encoded model response.
GET
returns model input and output specs as JSONPOST
expects JSON or an image file, returns JSON-encoded result of model inference