5 May

JSONPath in PostgreSQL: committing patches and selecting apartments

Postgres Professional corporate blogPostgreSQL

This article was written in Russian in 2019 after the PostgreSQL 12 feature freeze, and it is still up-to-date. Unfortunately other patches of the SQL/JSON will not get even into version 13.
Many thanks to Elena Indrupskaya for the translation.

JSONPath


All that relates to JSON(B) is relevant and of high demand in the world and in Russia, and it is one of the key development areas in Postgres Professional. The jsonb type, as well as functions and operators to manipulate JSON/JSONB, appeared as early as in PostgreSQL 9.4. They were developed by the team lead by Oleg Bartunov.

The SQL/2016 standard provides for JSON usage: the standard mentions JSONPath — a set of functionalities to address data inside JSON; JSONTABLE — capabilities for conversion of JSON to usual database tables; a large family of functions and operators. Although JSON has long been supported in Postgres, in 2017 Oleg Bartunov with his colleagues started their work to support the standard. Of all described in the standard, only one patch, but a critical one, got into version 12; it is JSONPath, which we will, therefore, describe here.

At elder times people used to store JSON in text fields. In 9.3 a special data type for JSON was added, but related functionality was rather poor, and queries with this type were slow because of time spent on parsing the text representation of JSON. So, a lot of users who hesitated to start using Postgres had to choose NoSQL databases instead. Performance of Postgres increased in 9.4, when, thanks to O. Bartunov, A. Korotkov and T. Sigaev, a binary variant of JSON appeared in Postgres — the type jsonb.
Since it is not necessary to parse jsonb each time, it is much faster to use this type. Of functions and operators that appeared at the same time as jsonb, some only work with the new, binary, type; for example: an important containment operator @>, which checks whether an element or array is contained in JSONB:

SELECT '[1, 2, 3]'::jsonb @> '[1, 3]'::jsonb;

returns TRUE since the right-side array is contained in the array on the left. But

SELECT '[1, 2, [1, 3]]'::jsonb @> '[1, 3]'::jsonb;

returns FALSE because of a different nesting level, which must be specified explicitly. For the jsonb type, an existence operator ? (question mark) is introduced, which checks whether a string is an object key or an array element at the top level of the JSONB value, as well as two similar operators (for details refer here). They are supported by GIN indexes with two GIN operator classes. The-> (arrow) operator allows «moving» across JSONB; it returns the value by a key or by an array index in the case of an array. There are a few more operators for movement. However, it is not possible to define filters with a functionality similar to WHERE. It was a breakthrough: thanks to jsonb, the popularity of Postgres as an RDBMS with NoSQL features began to grow.

In 2014 A. Korotkov, O. Bartunov and T. Sigaev developed the jsquery extension, eventually included in Postgres Pro Standard 9.5 (and higher Standard and Enterprise versions). It adds extremely broad capabilities to work with json(b). This extension defines a query language to retrieve data from json(b), along with indexes to accelerate execution of the queries. This functionality was in demand with users, who were not ready to wait for the standard and for inclusion of the new functions in a «vanilla» version. The fact that the development was sponsored by Wargaming.net proves the practical value of the extension. A special type, jsquery, is implemented in the extension.

Queries in this language are compact and look similar to the following:

SELECT '{"apt":[{"no": 1, "rooms":2}, {"no": 2, "rooms":3}, {"no": 3, "rooms":2}]}'::jsonb @@ 'apt.#.rooms=3'::jsquery;

The query asks whether there are three-room apartments in the house. We have to explicitly specify the type jsquery since the @@ operator is also available in the jsonb type now. The specification of jsquery is here, and a presentation with numerous examples is here.

So, everything needed to work with JSON has already been available in Postgres, and then the SQL:2016 standard appeared. The semantics turned out to be not so different from ours in the jsquery extension. It may be that the authors of the standard were looking at jsquery from time to time while inventing JSONPath. Our team had to re-implement what we already had a bit differently and of course, to implement plenty of new as well.

More than a year ago, at the March commitfest, the fruits of our programmer efforts were proposed to the community as 3 big patches that support the SQL:2016 standard:

SQL/JSON: JSONPath;
SQL/JSON: functions;
SQL/JSON: JSON_TABLE.

But to develop a patch is only half the journey, it is also uneasy to promote patches, especially if they are big and involve multiple modules. A lot of review-update iterations are required, and quite a few resources (man-hours) need to be invested in promotion of the patch. The chief architect of Postgres Professional Alexander Korotkov himself undertook this task (thanks to the committer status he has now) and got the JSONPath patch committed, and this is the main patch in this series. The second and third patches have the Needs Review status now. JSONPath, on which the efforts were focused, enables manipulating the JSON(B) structure and is flexible enough to extract its elements. Of 15 items specified in the standard, 14 are implemented, which is greater than in Oracle, MySQL and MS SQL.

The notation in JSONPath differs from Postgres operators for manipulating JSON and from the notation of JSQuery. The hierarchy is denoted by dots:

$.a.b.c (in the notation of Postgres 11 we would have it as 'a'->'b'->'c');
$ is the current context of the element — the expression with $ actually specifies the json(b) area to process and, in particular, to be used in a filter, the remaining part is unavailable for manipulation in this case;
@ is the current context in the filter expression — all paths available in the expression with $ are went through;
[*] is an array;
* is a wildcard relevant to the expression with $ or @, it replaces any value on a path within one level of the hierarchy (between two dots in the dot notation);
** is a wildcard relevant to the expression with $ or @, it replaces any value on a path regardless of the hierarchy — it is very convenient to use if we are unaware of the nesting level of the elements;
the "?" operator enables specifying a filter similar to WHERE:
$.a.b.c? (@.x > 10);
$.a.b.c.x.type(), as well as size(), double(), ceiling(), floor(), abs(), datetime(), keyvalue() are methods.
A query with the jsonb_path_query function (functions will be discussed further) may look as follows:

SELECT jsonb_path_query_array('[1,2,3,4,5]', '$[*] ? (@ > 3)');
 jsonb_path_query_array 
------------------------
 [4, 5]
(1 row)

Although a special patch with functions has not been committed, the JSONPath patch already contains key functions to manipulate JSON(B):


jsonb_path_exists('{"a": 1}', '$.a') returns true (is called by the "?" operator)
jsonb_path_exists('{"a": 1}', '$.b') returns false

jsonb_path_match('{"a": 1}', '$.a == 1') returns true (called by the "@>" operator)
jsonb_path_match('{"a": 1}', '$.a >= 2') returns false

jsonb_path_query('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 2)') returns 3, 4, 5
jsonb_path_query('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 5)') returns 0 rows

jsonb_path_query_array('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 2)') returns [3, 4, 5]
jsonb_path_query_array('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 5)') returns []

jsonb_path_query_first('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 2)') returns 3
jsonb_path_query_first('{"a": [1,2,3,4,5]}', '$.a[*] ? (@ > 5)') returns NULL

Note that the equality in JSONPath expressions is a single "=" vs. double in jsquery: "==".

For more elegant illustrations, we will generate JSONB in a one-column table house:

CREATE TABLE house(js jsonb);
INSERT INTO house VALUES
('{
	"address": {
		   "city":"Moscow",
		   "street": "Ulyanova, 7A"
	},
	"lift": false,
	"floor": [
		 {
			"level": 1,
		 	"apt": [
		       	       {"no": 1, "area": 40, "rooms": 1},
		       	       {"no": 2, "area": 80, "rooms": 3},
		       	       {"no": 3, "area": 50, "rooms": 2}
			]
		},
		{
			"level": 2,
			"apt": [
		       	       {"no": 4, "area": 100, "rooms": 3},
		       	       {"no": 5, "area": 60, "rooms": 2}
			]
		}
	]
}');


Fig. 1 The JSON tree with apartment leaves visualized.

This JSON looks weird, doesn’t it? It has a confusing hierarchy, but it's a real-life example, and in real life, you often have to deal with what there is and not with what there must be. Armed with the capabilities of the new version, let's find apartments on the 1-st and 2-nd floors, but not the first on the list of apartments of the floor (which are colored in green in the tree):

SELECT jsonb_path_query_array(js, '$.floor[0, 1].apt[1 to last]')
FROM house;
---------------------
[{"no": 2, "area": 80, "rooms": 3}, {"no": 3, "area": 50, "rooms": 2}, {"no": 5, "area": 60, "rooms": 2}]

In PostgreSQL 11 we will have to ask like this:

SELECT jsonb_agg(apt) FROM (
        SELECT apt->generate_series(1, jsonb_array_length(apt) - 1) FROM (
             SELECT js->'floor'->unnest(array[0, 1])->'apt' FROM house
        ) apts(apt)
) apts(apt);

Asking a very simple question now: are there any strings that contain the value of «Moscow» (anywhere)? It is really simple:

SELECT jsonb_path_exists(js, '$.** ? (@ == "Moscow")') FROM house;

In version 11, we would have to create a huge script:

WITH RECURSIVE t(value) AS (
      SELECT * FROM house UNION ALL (
            SELECT COALESCE(kv.value, e.value) AS value
            FROM t
            LEFT JOIN LATERAL jsonb_each (
                CASE WHEN jsonb_typeof(t.value) = 'object' THEN t.value
                ELSE NULL END
            ) kv ON true
            LEFT JOIN LATERAL jsonb_array_elements (
                CASE WHEN jsonb_typeof(t.value) = 'array' THEN t.value
                ELSE NULL END
            ) e ON true
            WHERE kv.value IS NOT NULL OR e.value IS NOT NULL
      )
) SELECT EXISTS (SELECT 1 FROM t WHERE value = '"Moscow"');


Fig. 2 The JSON tree — Moscow found!

Now searching for an apartment on any floor with the area from 40 to 90 square meters:

select jsonb_path_query(js, '$.floor[*].apt[*] ? (@.area > 40 && @.area < 90)')="" from="">
         jsonb_path_query          
-----------------------------------
 {"no": 2, "area": 80, "rooms": 3}
 {"no": 3, "area": 50, "rooms": 2}
 {"no": 5, "area": 60, "rooms": 2}
(3 rows)

Using our JSON to find apartments with the numbers larger than 3-rd:

SELECT jsonb_path_query(js, '$.floor.apt.no ? (@>3)') FROM house;
 jsonb_path_query 
------------------
 4
 5
(2 rows)

And this is how jsonb_path_query_first works:

SELECT jsonb_path_query_first(js, '$.floor.apt.no ? (@>3)') FROM house;
 jsonb_path_query_first 
------------------------
 4
(1 row)

It turns out that only the first value that meets the filtering condition is selected.

A Boolean operator JSONPath for JSONB — @@ — is called a match operator. It computes a JSONPath predicate by calling the jsonb_path_match_opr function.

Another Boolean operator — @? — is the existence check, it answers the question whether a JSONPath expression will return SQL/JSON objects and calls the jsonb_path_exists_opr function:

checking '[1,2,3]' @@ '$[*] == 3' returns true;
and '[1,2,3]' @? '$[*] @? (@ == 3)' also returns true

We can get the same result using different operators:


js @? '$.a' is equivalent to js @@ 'exists($.a)'
js @@ '$.a == 1' is equivalent to js @? '$ ? ($.a == 1)'

JSONPath Boolean operators are attractive because they are supported and accelerated by GIN indexes. jsonb_ops and jsonb_path_ops are the appropriate operator classes. In the example below we turn off SEQSCAN since we manipulate a small table (for large tables the optimizer will choose Bitmap index on its own):

SET ENABLE_SEQSCAN TO OFF;
CREATE INDEX ON house USING gin (js);
EXPLAIN (COSTS OFF) SELECT * FROM house
      WHERE js @? '$.floor[*].apt[*] ? (@.rooms == 3)';
QUERY PLAN
--------------------------------------------------------------------------------
Bitmap Heap Scan on house
   Recheck Cond: (js @? '$."floor"[*]."apt"[*]?(@."rooms" == 3)'::jsonpath)
   -> Bitmap Index Scan on house_js_idx
       Index Cond: (js @? '$."floor"[*]."apt"[*]?(@."rooms" == 3)'::jsonpath)
(4 rows)

All functions like jsonb_path_xxx() have the same signature:

jsonb_path_xxx(
js jsonb,
jsp jsonpath,
vars jsonb DEFAULT '{}',
silent boolean DEFAULT false
)

vars — is an argument of JSONB type to pass variables into a JSONPath expression:

SELECT jsonb_path_query_array('[1,2,3,4,5]', '$[*] ? (@ > $x)',
vars => '{"x": 2}');
jsonb_path_query_array
------------------------
[3, 4, 5]

It is tricky to go without vars when doing a join where one of the tables contains a field of type jsonb. Say, we are developing an application that searches apartments in that very house for employees, who recorded their requirements for a minimal area in a questionnaire:

CREATE TABLE demands(name text, position text, demand int);
INSERT INTO demands VALUES ('Gabe','boss', 85), ('Fabe','junior hacker', 45);
SELECT jsonb_path_query(js, '$.floor[*].apt[*] ? (@.area >= $min)', vars => jsonb_build_object('min', demands.demand)) FROM house, demands WHERE name = 'Fabe';
-[ RECORD 1 ]----+-----------------------------------
jsonb_path_query | {"no": 2, "area": 80, "rooms": 3}
-[ RECORD 2 ]----+-----------------------------------
jsonb_path_query | {"no": 3, "area": 50, "rooms": 2}
-[ RECORD 3 ]----+-----------------------------------
jsonb_path_query | {"no": 4, "area": 100, "rooms": 3}
-[ RECORD 4 ]----+-----------------------------------
jsonb_path_query | {"no": 5, "area": 60, "rooms": 2}

Lucky Fabe can select among four apartments. But as soon as we change one letter in the query from «F» to «G», there will be no choice! Only one apartment will be suitable.

One more keyword yet to mention: silent — is a flag to suppress error handling, which becomes the programmer's responsibility.

SELECT jsonb_path_query('[]', 'strict $.a');
ERROR: SQL/JSON member not found
DETAIL: jsonpath member accessor can only be applied to an object

An error occurred. But this way it won't occur:

SELECT jsonb_path_query('[]', 'strict $.a', silent => true);
jsonb_path_query
------------------
(0 rows)

By the way, mind the errors! According to the standard, numeric errors do not generate error messages, so it's the programmer's responsibility to handle them:

SELECT jsonb_path_query('[1,0,2]', '$[*] ? (1/ @ >= 1)');
jsonb_path_query
------------------
1
(1 row)

When computing the expression, array values are went through, which include zero, but division by zero does not generate an error.

Functions will operate differently depending on the mode selected: Strict or Lax (chosen by default). Assume that we are searching for a key using the Lax mode in a JSON that does not contain that key:

SELECT jsonb '{"a":1}' @? 'lax $.b ? (@ > 1)';
?column?
----------
f
(1 row)

Now using the Strict mode:

SELECT jsonb '{"a":1}' @? 'strict $.b ? (@ > 1)';
?column?
----------
(null)
(1 row)

That is, where we got FALSE in a the «lax» mode, in the «strict» mode we got NULL.

In the Lax mode, an array with a complex hierarchy [1,2,[3,4,5]] always unfolds to [1,2,3,4,5]:

SELECT jsonb '[1,2,[3,4,5]]' @? 'lax $[*] ? (@ == 5)';
?column?
----------
t
(1 row)

In the Strict mode, number «5» won't be found since it is at the bottom level of the hierarchy. To find it, we will have to change the query by replacing "@" with "@[*]":

SELECT jsonb '[1,2,[3,4,5]]' @? 'strict $[*] ? (@[*] == 5)';
?column?
----------
t
(1 row)

В PostgreSQL 12, JSONPath is a data type. The standard says nothing of a need for the new type; this is a matter of implementation. With the new type, we gain full-featured work with jsonpath by means of operators and accelerating indexes that are already available for JSONB. Otherwise, we would have to integrate JSONPath at the level of executor and optimizer codes.

You can read about SQL/JSON syntax here, for example.

Some examples in this article are taken from Oleg Bartunov's presentation at Saint Highload++ in St. Petersburg on April 9, 2019.

Oleg Bartunov's blog touches upon SQL/JSON standard-2016 conformance for PostgreSQL, Oracle, SQL Server and MySQL.

Here you can find a presentation on SQL/JSON.

And here is the introduction to SQL/JSON.
Tags:postgresqlpostgresdbmsrdbmsdata bases
Hubs: Postgres Professional corporate blog PostgreSQL
+6
1.7k 5
Comments 5