Not sure if this is a common problem, but I built a tool to help me quickly understand the main schema, and where most of the data is, for a new JSON file given to me. It makes the assumption that sometimes peer elements in a list will have the same structure (eg they'll be objects with similar sets of keys). If that's true, it learns the structure of the file, prints out the heaviest _aggregated_ path (meaning it thinks in terms of a directory-like structure), as well as giving you various size-per-path hints to help introduce yourself to the JSON file:
Is there a general name for the kind of data structure JSON represents?
We see this kind of Nete’s data all over the place (json, yaml, python dictionaries, toml, etc, etc) and I’m thinking wouldn’t it be nice if we had a path language that worked across these structures, just like how we can regex any strings?
So we can have a pathql executable that we can feed yaml and json data to, but I can reuse the query in Python when I want to extract values from a json stream I just deserialized.
In the past I've used XPath, and CSS selectors using this library to filter and find data in JSON: https://github.com/tomhodgins/espath
The approach is to take the JavaScript object, convert it to XML DOM, run the query (either using standard XPath, or standard CSS selectors) and then either convert the DOM back into objects, or another way I've seen it done is to keep a register of the original objects and retrieve the original objects.
In this way, JSON, and any JavaScript object with non-circularity can be sifted and searched and filtered in reliable ways using already-standardized methods just by using those technologies together in a fun new way.
There is not necessarily a need for inventing a new custom syntax/DSL for querying unless you don't want to make use of CSS and XPath, or have very specific needs.
SQLite includes a subset of JSON Path in the core database these days, used by functions like json_extract()
I wrote up my own detailed notes on that subset a while ago: https://til.simonwillison.net/sqlite/json-extract-path
I've used postgresql's jsonpath support to create user defined filtering rules on db rows. It made things a lot easier than whatever other methods I could come up with.
Insomnia and Bruno has a feature to filter the responses using JSON Path. It's really useful.
Is the necessity of tools like JSON Path really just an indication that APIs are increasingly returning too much junk and / or way more data than the client actually requested and / or needs?
In dev mode, our internal APIs return pretty printed JSON so one can inspect via view-source, more, or text editor.
I look forward to the inevitable JSON path injection attacks given how widespread XPATH injection used to be. ( See https://owasp.org/www-community/attacks/XPATH_Injection for more info. )
JSONPath is good when it comes to querying large JSON documents. But in my opinion, more than this is the need to simplify reading and writing from JSON documents. We use POJOs / model classes which can become a chore for large JSON documents. While it is possible to read paths, I had not seen any tool using which we could read and write JSON paths in a document without using POJOs. And so I wrote unify-jdocs - read and write any JSON path with a single line of code without ever using POJOs. And also use model documents to replace JSONSchema. You can find this library here -> https://github.com/americanexpress/unify-jdocs.
I am kind of surprised that they don't mention jq at all, it seems like a similar tool that is fairly wide spread.
kubectl has JSON Path support built in, it’s very useful
I wish there weren’t so many JSON path syntaxes. I’m comfortable with jq, then there’s JSON path, I forget which one AWS CLI is using, MySQL has their own. It’s impossible for me to get muscle memory with any of them.
Has anyone had performance issues using JSONPath? We are processing large pieces of data per request in a node express service and we believe JSONPath is causing our service to lock up and slow down. We’ve seen the issue improve as we have started refactoring out JSONPath usage for vanilla iteration and conditional checks.
There are a lot of factors at play so we can’t quite put our thumb on JSONPath, but it’s the current suspect and curious if others have run into anything similar.
This is exactly the type of thing that LLMs are very good at generating and explaining for you. I've done this countless times to create and understand regex patterns.
We're using JSONPath to annotate parts of JSON fields in PostgreSQL that need to be extracted/replaced for localization. Whilst I'd naturally prefer we didn't store display copy in these structures, it was a fun thing to implement.
Contrived example:
@localizedModel({
title: {},
quiz: { jsonPath: '$.[question,answerMd]' },
})
class MyQuiz {
title: string;
quiz: JSONObject;
}
1. Create a mock json document that has the structure you are trying to query.
2. Ask [newest LLM] to write the proper json path to get to the element you want to reach.
I've used JSONPath in a small side project that was supposed to be a linter for TypeScript with JSONPath querying over the abstract.syntax tree:
https://github.com/Tade0/permit-a38/tree/master
Ultimately the linting rules proved to be easier to write than read.
Alternative: use jless and copy the path of the thing you want, then (maybe) generailze
If you're on a Mac, OK JSON is a tremendous aid in working with documents and includes a decent JSONPath query dialog. (Very happy user)
These types of languages are a bad idea, just as XPath was. They are complex enough to be a maintenance/bug risk AND don't bring any additional benefit to just writing code in your normal programming language to do the same thing.
You can take my list comprehensions from my cold, dead hands.
There isn't a use case I've seen where these types of mini languages fit well. Ostensibly, you could give it to a user to write to query JSON in a domain-agnostic way in an app but I think it would just confuse most users as well as not being powerful enough for half of their use cases.
Sometimes it's better just to write code.
there is possibly a need for more unified standard across different implementations particularly from a software development and API design perspective.
During parsing and manipulation of JSON data, the syntactical discrepancies/behaviours between various libraries might need a common specification, for interoperability.
features like type-aware queries or schema validation, may be very helpful.
One of the best tools I've found for manipulating JSON via JsonPath syntax is https://jsonata.org
In addition to simple queries that allow you to select one (or multiple) matching nodes, it also provides some helper functions, such as arithmetic, comparisons, sorting, grouping, datetime manipulation, and aggregation (e.g. sum, max, min).
It's written in JS and can be used in Node or in the browser, and there's also a Python wrapper: https://pypi.org/project/pyjsonata/