BAQ to show all customers that share emails with other customers

We need to de-dupe our customer list for our integration with Hubspot CRM, which uses unique emails as a key. I’m struggling to figure out how to make a BAQ (and ultimately as dashboard) to show all customers that share an email address with another customer.

Here’s what ChatGPT suggests:

SELECT email, COUNT(*) AS count
FROM table_name
GROUP BY email
HAVING COUNT(*) > 1;

How would I do this in BAQ?

  • I created a query with a calculated field ‘CountEmail’, and added a subquery criteria of Count>1, which works to show all emails that are used more than once, along with their count.
    image

  • Then I added CustID, which gave an aggregate error, so I checked group-by. Then 0 records are returned.

I’m sure I’m doing something simply wrong… Suggestions?

I was thinking I’d need to set the count up in a subquery, but if it’s all off the Customer table, do I need to?

Thanks for the help!

If you set your initial query as a CTE then create another query to pull Customer Name/Num and Email from the CTE I think that would work best (easiest?).

Edit: Make the second query a top-level query, and do a join on your CTE query to Customer in the new top-level query on the email address field.

You shouldn’t have to create a subquery.

Here is an example for another table.

delete.baq (20.9 KB)

Note the Having checkbox on the bottom right on the Subquery criteria

1 Like

Ken - I forgot about being able to use the “Calculated” table in a sub-query criteria so thank you - but I am not sure that gets the answer for Andris in this case because of the additional fields he wants to display. I “think” it breaks down on the “Group By” clause - since the additional fields (Customer ID/Name/etc) need to have Group By checked, nothing (or very few) would end up with a “Count” greater than 1. I feel like there is a way to overcome this with window functions like ‘over’ but we are quickly exceeding my knowledge. Is there another way around that?

Agree let’s make this better.

Lets create a calculated field to count the emails and return data form the table, so the user can see everything in one query.

COUNT(*) OVER (PARTITION BY Customer.EMailAddress)

New BAQ
delete.baq (13.3 KB)

You can make sure it works and then add to the subquery criteria to show where count is greater than 1. Which removes the use of the HAVING all together.

Who needs ChatGPT? lol epiUsers.help is so much better. am i right??

Thanks, Ken! I tried your final one, and it ran. Then I tried filtering out Count<=1 by adding a SubQuery Criteria of Count > 1, but that errors out also.

image

image

I checked ‘Having’ and that errors out the the aggregate warning again.
image

If this were a smaller table, I’d do a filter on the BAQ results, but we’ve got over 1M records.
@dmccann - I’ll give the CTE a try now. Thanks!

Edit: Sorry had not seen the response from Andris before I posted this

Nice! But apparently adding the subquery criteria throws an error of “Windowed functions can only appear in the SELECT or ORDER BY clauses.” So while Andris could probably set a filter on the eventual dashboard… instead I went to ChatGPT (why not) and it suggested a CTE or subquery :man_facepalming: I asked about doing it without one of those and it suggested a ‘derived table’ which did work fine in a SQL query but not sure how to do that in a BAQ. Here is the Select statement that worked:

	SELECT
    dt.Calculated_Count,
    dt.Customer_CustID,
    dt.Customer_CustNum,
    dt.Customer_EMailAddress
FROM (
    SELECT
        COUNT(*) OVER (PARTITION BY Customer.EMailAddress) AS [Calculated_Count],
        Customer.CustID AS [Customer_CustID],
        Customer.CustNum AS [Customer_CustNum],
        Customer.EMailAddress AS [Customer_EMailAddress]
    FROM
        Erp.Customer AS Customer
) AS dt
WHERE
    dt.Calculated_Count > 1;

Guess you have to have a subquery with window functions as you cannot do a select on them.

Check out this one.

delete.baq (20.6 KB)

Awesome sauce Ken and Dan! Thank you very much for the advice and examples. I added an additional subquery criteria to filter out the 984k customers with an empty string email address.