How to Extract Elements from Thousands of Lines of JSON? The answer is... 🧐
JSON, short for JavaScript Object Notation, was discovered and defined by Douglas Crockford in 2001. While a 22-year-old college student is just graduating, JSON at 22 has become one of the standards for Internet data exchange and is well known to developers.
As one of the developers, I’ve read countless JSON files. JSON can be fat or thin, long or short. A short JSON can be a simple array containing only a Boolean value:
[true]
And a long JSON can be astonishingly tens of thousands of lines:
When reading long JSON files, the most common requirement is to extract the value of a field or traverse the object values of an array, all of which are buried deep in the river of JSON.
For a programming novice, the first thought would be to use programming to get the desired field, suppose we have a JSON structure as follows:
{
"human": {
"person": {
"man": [
{
"name": "Jack",
"age": "17"
},
{
"name": "Mike",
"age": "32"
},
{
"name": "John",
"age": "23"
},
{
"name": "David",
"age": "41"
},
{
"name": "Eric",
"age": "29"
},
{
"name": "Chris",
"age": "38"
},
{
"name": "Tom",
"age": "27"
},
{
"name": "Peter",
"age": "35"
},
{
"name": "Robert",
"age": "26"
},
{
"name": "Daniel",
"age": "33"
}
]
}
}
}
Now I need to get the name
of all elements in the man
array, so we would definitely write this JS code:
const nameSet = data.human.person.man.map(manItem => manItem.name)
Thanks to the syntactic sugar and APIs of modern JavaScript, the code is concise. But while the code is simple, there are many issues to consider: one is the need to save this code file, two is the need for an environment to run this code, three is the need to print or save the execution result of the code, and four is the need to continuously debug this code and handle exception and edge cases and so on.
The programming novice continues to ponder, there are many scenarios to index values in JSON, clumsy programming seems not to be an efficient method, so he wants to rely on the currently popular ChatGPT:
🧑💻 I have the above JSON structure, help me extract the
name
field of all elements under theman
field.🤖 I’ll help you extract the
name
field of all elements underman
, the result is:
Jack
Mike,
John,
David,
Eric,
Chris,
Tom,
Peter,
Robert,
Daniel
The programming newbie was thrilled to get the result, but soon encountered another problem. When pasting tens of thousands of lines of JSON into the ChatGPT input box, they quickly received an error message:
Even the powerful ChatGPT succumbs in the face of extremely long and complex text. Of course, there are solutions, like inputting in segments, or programmatically calling the ChatGPT API. But if we resort to this, we are back to square one.
So, is there a better way? 🧐
The solution must be JSON Path! Let’s take the He3 JSON Path tool as an example. To get the name
field from the previous example, we just need to enter $..name
in the input box:
Upon first seeing $..name
, you might be confused. What kind of syntax is this? How does it work?
Let me introduce JSON Path:
JSON Path is an expression language for locating and extracting specific elements in JSON data. It provides a concise syntax that makes extracting data from complex JSON structures easy.
In simple terms, JSON Path is like Markdown. It’s a lightweight syntax that can extract fragments from a JSON structure.
As a language, JSON Path requires a certain amount of learning to grasp its syntax and keywords. But rest assured, the learning curve for JSON Path is not steep. Once mastered, you can extract specific key values, traverse arrays, filter elements based on conditions, perform slicing, and more. Handling JSON data becomes a breeze.
Let’s first introduce some common JSON Path syntax:
$
: Root node, representing the outermost layer of JSON data..
: Child node operator, used to access properties in an object.[]
: Index operator, used to access elements in an array or filter elements by conditions.*
: Wildcard, used to match any property name or array index...
: Recursive descent operator, used to search all levels in a nested structure.@
: Current node, can be used to reference the current node in a filtering condition.
Assume we add a "god": "God"
key-value pair in the JSON example mentioned earlier, and we want to use JSON Path to get the god
field, we can use $.god
:
For $..name
, we know it means “return the values of the name
key at all levels in the JSON data”. If we want to get all age
fields, we can change it to $..age
:
If we want to get the first element of the man
array, we can input $.
human.person.man
[0]
:
Of course, using the recursive descent operator is even more convenient $..man[0]
:
The index reference operator []
has even more advanced operations.
For instance, to get the first, second, and third elements, you can input $..man[0,1,2]
:
For getting 1
, 2
, 3
, such consecutive array elements, JSON Path provides a more convenient expression [start:end]
. The above can be changed to $..man[0:3]
:
❗️ Note that the expression [0,3
includes the start
but not the end
.
JSON Path also provides two types of expressions:
()
: Expression used for condition judgments or logical operations. You can use comparison operators (such as>
,<
,==
etc.) and logical operators (such as&&
,||
) to define conditions inside the parentheses. For instance,(@.length)
represents getting the last element of an array.?()
: Filter expression. In?()
, you can use any legal JavaScript expression to filter elements. Such expressions are calculated inside the filter and decide whether to select or exclude the current element based on their result. For instance,[?(@.age > 25)]
filters elements based on the “age” attribute, selecting those where the age is greater than 25.
With these two expressions, we can further improve the accuracy of data extraction. For instance, to get the last element of man
, we can use $..man[(@.length-1)]
:
@
represents the current element, .length
represents the length of the current element, so (@.length-1)
can be used to get the last element of an array.
If we want to filter out people with age
greater than or equal to 33
in the man
field, we can use the filter expression $..man[?(@.age >= 33)]
:
The above covers most of the commonly used JSON Path syntax. For less commonly used syntax, you can refer to the JSON Path Plus implementation.
JSON Path doesn’t have an official standard document, but there are some widely accepted and used implementations and documents. JSON Path Plus is one of them, and the He3 JSON Path tool adopts JSON Path Plus.
Returning to the main topic, in the following tens of thousands of lines of JSON:
If you want to see if there are friends
named Ellen Rowland
, you can: