cancel
Showing results for 
Search instead for 
Did you mean: 

In a select clause, what can not be resolved by hashing?

Former Member
0 Kudos

In ASE 15.0, which of the following select clauses can NOT be resolved by hashing?

  1. Distinct
  2. Group By
  3. Having
  4. Order by

Accepted Solutions (0)

Answers (3)

Answers (3)

former_member182090
Active Participant
0 Kudos

This comes from the ASE certification exam?

IMO the answer should be 4), although 3) could potentially also qualify (more on that below).

Reason is that hashing allows you to find identical (or non-identical) values very efficiently, which is suitable for many things such as distinct and group by.With distinct and group by, there is no requirement for any visible ordering of data though: it just needs to be grouped/filtered but exactly how it is grouped/filtered doesn't matter. For order-by things are different because here the order of the result set is defined as the user will see it, and this is something hashing cannot be used for.

If course, distinct and group-by can be resolved by means of classical sorting, and this is how is always worked in pre-15, it's just that hashing tends to be more efficient.

Now, on option #3 (HAVING), that doesn't actually require any sorting: HAVING predicates are applied on the groups created by GROUP BY. As such you could argue there is actually no sorting involved at all, which makes 4) the better answer for this question. I think 3) is therefore not a good potential answer for this question; UNION would have been a better one.

Rob V.

kevin_sherlock
Contributor
0 Kudos

The reasoning I was using to _guess_ that the answer was "HAVING" had to do with the fact that there are LAVA operators for DISTINCT, GROUP BY, and ORDER BY that involve hashing, but HAVING would fall under the RESTRICT operator, a different and unique operator in the LAVA engine.  I'm willing to learn something new though, and I can't argue against the reasoning that Bret and Rob give for choosing "ORDER BY" as the answer either

Former Member
0 Kudos

No reason you couldn't have answers 3 and 4.

AFAIK the question is specifically about what ASE 15 can resolve with hashing and that would be 1 and 2.

Whilst a having clause could maybe be satisfied with hashing (in other RDBMS), that would only be possible if the input into the having clause was something that needed to be filtered or sorted in some way (hash or otherwise). Perhaps if there were some way of generating a complex plan which optimistically applied the having before the grouping then there could be some scope to experiment with another hashing operator.

In ASE what makes it to the having clause is something that is simply filtered by the search argument of the having clause after the grouping has taken place.

Former Member
0 Kudos

Thanks everyone for your responses. The question is from an the Sybase ASE online practice exam. Nice to report I passed the certification exam in May 2013.

former_member188958
Active Contributor
0 Kudos

Congratulations!

kimon_moschandreou
Contributor
0 Kudos

I also think it is HAVING, anyway hash join is related mostly to allrows_dss optimization goal and should be used with really big tables (millions of rows) I don't think it provides any advantage with a few data. Also it requires too many resources (memory, tempdb space,cpu) for using it by many users concurrently.

Former Member
0 Kudos

Hi Kimon,

There are also distinct hashing and group hashing operators (as well as hash union) , so we're looking at more than hash joins.

..however...

What you said made me think (along with the question in the other forum about the non-ansi group by query)

It is possible for a hash join to satisfy having clause in a non-ansi group by query which may have columns referenced in the having clause that are not referenced in the group by clause.

It all gets a bit abstract and a bit pointless really 🙂  but in the circumstance below we can show that a having clause can be satisfied with hashing.

create table a (a1 int, a2 char(1))

go

create table b (b1 int, b2 char(1))

go

insert a values (1, 'a')

insert a values (2, 'b')

go

insert b values (1, 'a')

insert b values (2, 'c')

go

select max(a1),a2,b2 from a,b

where a1=b1

group by a2

having a2=b2

plan '( h_join ( sort ( t_scan [b@Gtt0] ) ) ( group_sorted ( sort ( m_join ( sort ( t_scan a ) ) ( sort ( t_scan [b@Gtt3] ) ) ) ) ) )'

The hash join at VA=9 (which will be in the showplan) is directly satisfying the having clause.

This also made me think about other more 'complex' having clauses, but I couldn't find a case where it may be directly resolved with any hash based operators, as the aggregation will move about (if the aggregate is in the having clause itself) or it will eagerly apply the having clauses ahead of the actual hash aggregation operation. It may be an odd one but it is at least one example of a having clause being resolved with hashing.

So I guess that does leave us with only order by that cannot directly be resolved with hashing, it would need some other operator in combination (I think?). I can't see a scenario whereby a hash based operator can take an unordered result set as input and give it a fixed order on output.

Cheers,

Simon

Former Member
0 Kudos

..one more amendment (these kind of questions niggle at me :-))

Even an ansi having clause may be resolved by hashing if indeed the optimizer decides to take care of the having clause before the group by, which it may well do provided it it is still guaranteed to give the correct results of course.

select max(a1),a2,b2 from a,b

where a1=b1

group by a2,b2

having a2=b2

plan '( group_hashing ( h_join ( t_scan a ) ( t_scan b ) ) )'

This will resolve the join predicate and the having clause using a hash join on a1=b1and a2=b2 before performing the grouped aggegration operation (which is this example will also be resolved with hashing).

kevin_sherlock
Contributor
0 Kudos

Looks like a test question to me, but my #2 pencil would circle "HAVING".  Did I pass?

former_member188958
Active Contributor
0 Kudos

Well, I think I would answer (4) Order By.  The docs mention in several places that HASH isn't good for sorting.  It is efficient for checking for the existance of values, and duplicate values would be in the same bucket, so easy to check for duplicates.