I think it might be worth taking a look at https://github.com/jerryshao/spark-hive-streaming-sink. Under the hood it seems to leverage https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+. I haven’t used it myself yet but it seems to be doing what you’re looking for.