Indexing JSON logs with Parquet
We frequently use Spark SQL and EMR to analyze terabytes of JSON request logs. The builtin JSON support in Spark is easy to use and works well for most use cases. For example, this small piece of code will infer the schema of the files and provide a table that can be queried with standard SQL:
from Pocket http://ift.tt/2iaM8l5
via IFTTT