The background to this is simple, you have a parquet file and want to inspect the schema. The most obvious way is running spark-shell or pyspark, loading a dataframe, and calling the printSchema function.
What if for some weird reason it is not possible to load all these easy-to-use tools. The hard way but really useful is using the parquet-tools jar.
- Download the parquet-tool jar from this github link -> https://github.com/viirya/parquet-tools/blob/master/parquet-tools-1.8.1.jar
- Run the following command -> hadoop jar parquet-tools-1.8.1.jar schema -d /path/to/parquet/file
The output from the command above at the very least will show the file schema and a few other things like compression algorithm.