** Calcite, as the most commonly used SQL parsing engine in the field of big data, supports SQL parsing for large-scale projects such as Flink, hive, kylin, druid**, etc., and if you want to study Flink SQL source code in depth, Calcite is also one of the necessary skills, which is worth learning

** **

**We also use its self-developed SQL engine internally to support correlation query of any number of heterogeneous data sources through a set of SQL (eg: mysql table join on the HBase table is doing an aggregation calculation).**

** Because calcite has more functions, this article mainly starts with the important main process source code of calcite, and mainly focuses**

on sorting

**out the stages of Calcite SQL execution**

on the VolcanoPlanner optimizer

**To sum up**

,

**1. Parse the incoming SQL into a lexical tree through the Parser parser, with SqlNode as the node of the tree**

** 2. Do lexical validation validate, type verification, metadata verification, etc**

**. 3. Convert the verified SqlNode tree into the corresponding relational algebraic expression, which is also a tree, with RelNode as the node**

4. The RelNode relational algebraic expression tree**, through the built-in two optimizers Volcano, Hep optimizes the relational algebraic expression to obtain a tree of optimal logical algebra, which is also RelNode**

** 5. The optimal logical algebraic expression (RelNode) will be converted into a corresponding executable physical execution plan (the conversion logic varies according to the framework), like Flink will be converted to its Operator**

to

**run to see each stage in detail**

1. The stage of parsing SQL

**statements into syntax trees (SQL – > SqlNode)**

is

**not actually implemented by calcite, but calcite itself defines a set of SQL syntax parsing rule templates, implemented through the framework of javaCC **

**Pull the code to take a look**

** **

** The Parser.jj in the source code is the syntax template of the calcite core, for example, what syntax we want to add to flink sql such as count window will be modified here to define what**

sql

**token is the specific logic of how to return sqlNode **

**Take an example**

**"select ID, NAME from MYHBASE. MYHBASE where ID = '1' "**

will

**be parsed into such a sqlNode tree**

** **

** Without going into detail here, javacc can refer to the official website (https://javacc.github.io/javacc/)**

**2 . Syntax validation validator stage**

** Here through the validator to calibrate, here is not expanded, not the point**

**3. Turn sqlNode into a logical expression tree for relNode (sqlNode – > relNode).**

** Here calcite has the default sql2rel converter org.apache.calcite.sql2rel.SqlToRelConverter**

will

**not be expanded here**

**4. Relational Algebraic Tree Optimization (relNode – > relNode)**

**is the focus of the medium emphasis here!!! The reason why so many frameworks choose Calcite is because its SQL optimization**

**passes through 3 stages we get a relNode tree, but this tree here is not the optimal solution, and Calcite gets an optimized best tree through its two optimizer planners **

**Here is the core of the whole calcite, calcite provides two optimizers**

**HepPlanner rule optimizer (simply understood as defining many rules Rule, As long as the tree nodes that can meet the optimization rules are converted according to the rules to get a rule-optimized tree, which is relatively simple**

)

**VolcanPanner cost optimizer (based on cost cost, the tree will iterate according to the rule, constantly calculating the generation value of the root relnode node to find the optimal tree) **

**Let’s look at**

**the select ID first, NAME from a where ID = '1'**

**So what does a SQL-converted RelNode tree look like**

**? **

** **

** It can be seen that many nodes are named after Logical, because this is a logical node converted by calcite default converter (SqlToRelConverter) in stage 3, and logical nodes cannot run without physical properties**

** Next step into calcite’s cost optimizer VolcanoPlanner for optimization**

** **

**returns the best cost Solve**

**the calcite optimization method**

** **

** First calcite will set the relNode we got from the previous stage to the root of our cost Volcano optimizer and go**

**to it org.apache.calcite.plan.volcano.VolcanoPlanner.registerImpl() In method**

** **

**The breakpoint will first register the relnode’s input**

**in the ensureStered method**

during the register process

**You can see that there is a loop back to the registerImpl() method, that**

is

**, the tree child nodes deep traversal is registered first**

** Next, let’s take a look at what the registration process**

**does after going back to the VolcanoPlanner.registerImpl() method that you just looked at since it is a deep traversal**

**You can see that the rule is about to be triggered, and here we have to intersperse a concept, Rule in calcite**

**From the class description, we can know that rules can convert one expression into another, what does it mean, let’s see what abstract methods there are**

** **

** What does that mean? To sum up, the two core methods**

**matches() return whether the current relnode matches this rule**

** onMatch() When this rule is matched, this method is called, where the transformTo() method can be called, which converts one relNode into another**

** Rules are the core of the entire calcite, in fact, all SQL optimization is composed of corresponding rules, implement the SQL optimization logic as the corresponding rule, let the corresponding relNode tree nodes do the corresponding transformations to get the optimal best execution plan**

**OK back to our main process, Continue with the volcanoPlanner.fireRule() method above to see how the rule is triggered**

** **

** The logic here is relatively simple, that is, when the relnode satisfies the rule, it calls the match() method of volcanoRuleCall**

**But one thing to note is that classOperands here contains relNode and all the rules that may match the relnode, and can be up or down**

**. **

**Let’s say I have a RelNode for a LogicFilter and then define two rules**

**, RuleA **

**operand(Logicalfilter.class, operand(TableScan.class))**

**RuleB **

**operand(Logicalproject.class, operand(Logicalfilter.class))**

** Then these two rules will enter the mapping relationship classOperands on this possible match, and**

**then continue to look at the code after matching the rule**

** **

** Then I walked to the onMatch of VolcanoPlanner.DeferringRuleCall**

** **

Here is to add this rule to the

**ruleQueue in IterativeRuleDriver, this queue is specially used to store the matched rules, it is not difficult to find that these matching rules only exist in the queue, but these rules have not been executed**

** So how long will it be executed? **

**Going back to the main process when all the relnode children in setRoot register**

** **

** Will take the specific planner findBestExp() method, from the name can be seen to find the optimal**

expression

**here to say in advance, claicte’s optimization principle is that it assumes that if an expression is optimal, then its local is also optimal, then the current relNode best we only need to care, from **

1. Add up all the best of the

**child node**

**2. Add up all the rules that you can match, as well as the best of the remaining parts**

**What is obtained from this comparison is the optimal solution of the current relnode**

**Quote a diagram**

** **

** If A can only match these two rules, then we only need to consider these few situations when enumerating the optimal solutions**

**for those who do not know much about the principle, you can take a look at this article https://io-meter.com/2018/11/01/sql-query-optimization-volcano/ **

** Then look at findBestexp().**

** **

** **

** Here is the main loop of the whole optimization to find the optimal solution bestExp**

,

**constantly take the rule from the queue, run the rule, and do not exit until all the rules have been executed**

** That’s right, the queue here is mentioned earlier, when the default relnode is registered, it will put the matching rule in this queue and naturally**

**have a question here, as mentioned earlier, the relNode node will be changed when the rule is running, that is, the equivalent node of adding relndoe, **

Then the structure

**change of the tree here will cause the previous rule that cannot be matched to change the structure of the tree to match, then the rule that can be matched here will not be missed, then look at the method used to convert equivalent nodes in the onMatch() of the rule, transformTo().**

** **

**The new node of the transformation is again executed in the transformTo method**

** **

** That is to say, the new node will also go through, the default relNode registration process, when the new node is registered as an equivalent node will have a new rule matching, and will add this rule to the rulequeu to wait for the next execution of the**

rule,

**in addition, when this relnode node will be converted by the rule rule, The generated new relnode will be set to be added to the relnode’s equivalent node**

** **

**Add an equivalent node, and in the propagateCostImprovement method**

** **

**Calculating whether the current equivalent node will make the cost of the current relnode decrease, and if it falls, then update the best cost of the current relnode and bubble up to modify the optimal best cost of the parent relnode**

**while true Keep triggering the pull ruleQueue in**

the rule until the rule is empty

**and then the rule will add a**

**new equivalent node If the cost is better, Updating the best Relnode of the entire tree**

**The new equivalent relnode will match the new rule, and the new rule will be added to the rulequeue**

** Enter the next loop until there are no rules to match, so that bestexp can return the optimized**

optimal relnode,

**and then according to this optimal relnode, different frameworks translate into their own APIs**

**Calciet has finally finished speaking, and then you can start parsing the source code of Flink SQL**

Local address: https://www.cnblogs.com/ljygz/p/15421973.html

end

public number (

zhisheng) reply to Face, ClickHouse, ES, Flink, Spring, Java, Kafka, Monitoring < keywords such as span class="js_darkmode__148"> to view more articles corresponding to keywords.

like + Looking, less bugs 👇