Run against a file using -json. Additionally, options are available for -flatten and -multiline. This is helpful for nested and various formats.
-ds json_file_example \-f s3a://bucket_name/file.json \-h sandbox-owl.us-east4-c.c.owl-node.internal:5432/postgres \-master spark://sandbox-owl.us-east4-c.c.owl-node.internal:7077 \-json \-flatten \-multiline
Automatic flattening will infer schema and explode all structs, arrays, and map types.
-ds public.json_sample \-lib "/opt/owl/drivers/postgres/" \-h sandbox-owl.us-east4-c.c.owl-node.internal:5432/postgres \-master spark://sandbox-owl.us-east4-c.c.owl-node.internal:7077-q "select * from public.jason"-rd "2021-01-17"-driver "org.postgresql.Driver"-cxn postgres-gcp-fq "select \get_json_object(col_3, '$.data._customer_name') AS `data_customer_name` , \get_json_object(col_3, '$.data._active_customer') AS `data_active_customer` , \from dataset "
Pass in the path to Owls' -fq parameter. This is great for mixed data types within a database. For example, if you store JSON data as a string or a blob among other data.
// Flattenval colArr = new JsonReader().flattenSchema(df.schema)colArr.foreach(x => println(x))
This Owl utility will traverse the entire schema and print the proper get JSON object spark sql strings. You can use this instead of typing each query statement into the command line -fq parameter as seen above.
import com.owl.common.options._import com.owl.core.util.OwlUtilsimport com.owl.core.activity2.JsonReaderval connProps = Map ("driver" -> "org.postgresql.Driver","user" -> "user","password" -> "password","url" -> "jdbc:postgresql://10.173.0.14:5432/postgres","dbtable" -> "public.data")// Sparkvar rdd = spark.read.format("jdbc").options(connProps).load.select($"col_name").map(x=>x.toString()).rddvar df = spark.read.json(rdd)// Flattenval colArr = new JsonReader().flattenSchema(df.schema)val flatJson = df.select(colArr: _*)flatJson.cache.count// Optsval dataset = "json_example"val runId = s"2021-01-14"val opt = new OwlOptions()opt.dataset = datasetopt.runId = runIdopt.datasetSafeOff = true// OwlcheckOwlUtils.resetDataSource("sandbox-owl.us-east4-c.c.owl-node.internal","5432/postgres","danielrice","owl123", spark)val owl = OwlUtils.OwlContext(flatJson, opt)owl.register(opt)owl.owlCheck