Fix: Power Query Column Duplication Bug After Refresh

by Admin 54 views
Bug: Column Duplication After Power Query Refresh (pq-refresh β†’ pq-loadto)

Hey guys! Ever run into a weird bug where your Power Query duplicates columns after a refresh? Yeah, super annoying, right? Especially when you're dealing with connection-only queries. Let's dive into what was causing this issue and how we squashed it!

Bug Description

So, here's the deal. Imagine you've got an LLM (or a user) rocking a Power Query in connection-only mode. You do the following:

  1. Refresh the query using pq-refresh.
  2. Try to load it to a table using pq-loadto or SetLoadToTableAsync.

Boom! What happens? A whole mess of problems:

  • ❌ Columns are duplicated. You get multiple identical columns where you should only have one.
  • ❌ Data loads as a plain range instead of the proper QueryTable.
  • ❌ Multiple QueryTables might be created when you should only have one.

This can really mess up your data and your workflow. Let's dig into how to reproduce it.

Reproduction Steps

To recreate this lovely bug, just follow these steps:

  1. Create a connection-only Power Query with a single column.
  2. Call RefreshAsync() on that query.
  3. Use SetLoadToTableAsync() to load the data to your worksheet.
  4. Observe the horror: duplicated columns and a plain range instead of the expected QueryTable.

Now that we know how to make it happen, let's talk about why it was happening in the first place.

Root Cause

The culprit lies within the code, specifically in SetLoadToTableAsync and potentially in UpdateAsync. Let's break it down.

SetLoadToTableAsync (PowerQueryCommands.LoadConfig.cs)

Original Code (STEP 3 cleanup):

// Delete ALL existing QueryTables on sheet (not query-specific)
dynamic? queryTables = targetSheet.QueryTables;
for (int i = queryTables.Count; i >= 1; i--)
{
    dynamic? qt = queryTables.Item(i);
    qt?.Delete();
    ComUtilities.Release(ref qt);
}

// THEN called usedRange.Clear() - THIS WAS THE BUG!
usedRange.Clear();  // ❌ Harmful when no data exists yet

The Problem:

  • The code was calling usedRange.Clear() before creating the QueryTable. This is like wiping the canvas clean before you even start painting!
  • For connection-only queries (where there's no existing data yet), this created a "dirty" state in Excel.
  • When the QueryTable was finally created, Excel didn't initialize it properly. It was all messed up!
  • The result? Data loaded as a regular range, and those pesky columns got duplicated.

UpdateAsync (PowerQueryCommands.Lifecycle.cs)

Current Code Pattern:

// Uses RemoveQueryTablesFromSheet (query-specific deletion)
PowerQueryHelpers.RemoveQueryTablesFromSheet(targetSheet, queryName);

// ALSO calls usedRange.Clear()
usedRange.Clear();  // ⚠️ May have same issue

Status: The usedRange.Clear() call here is still under investigation. Removing it causes test failures, which means columns start accumulating. More on this later!

So, now that we understand the root cause, let's talk about how we fixed it.

Solution Implemented

SetLoadToTableAsync Fix

Changed to:

// Delete ONLY QueryTables for this specific query (consistent with UpdateAsync)
PowerQueryHelpers.RemoveQueryTablesFromSheet(targetSheet, queryName);

// NO usedRange.Clear() - let Excel handle data cleanup
// QueryTable deletion is sufficient

Benefits:

  • βœ… Now, it only removes QueryTables for the specific query, not all of them.
  • βœ… We're letting Excel handle the data cleanup, which it's much better at.
  • βœ… No more interference with QueryTable creation.
  • βœ… It's consistent with the UpdateAsync pattern.

Basically, we stopped messing with things we shouldn't have been messing with! This leads to a more stable and predictable result.

Test Coverage Enhanced

We also beefed up the tests to make sure this bug stays squashed. Here's what we did:

Bug_RefreshConnectionOnlyThenSetLoadToTable_CreatesRangeInsteadOfTable

We added assertions to verify:

  • βœ… A QueryTable exists (not just a regular range).
  • βœ… There's exactly 1 QueryTable (no duplicates allowed!).
  • βœ… No more duplicate columns.

Update_QueryColumnStructure_UpdatesWorksheetColumns

We added assertions after each UpdateAsync to verify:

  • βœ… The QueryTable still exists (hasn't been converted to a range).
  • βœ… There's exactly 1 QueryTable (still no duplicates!).
  • βœ… The column count is correct (no sneaky accumulation).

Basically, we're making sure that the QueryTable stays a QueryTable, and that we don't end up with a pile of extra columns.

Files Changed

Here's a list of the files that got a makeover:

  • src/ExcelMcp.Core/Commands/PowerQuery/PowerQueryCommands.LoadConfig.cs - SetLoadToTableAsync refactored.
  • tests/ExcelMcp.Core.Tests/Integration/Commands/PowerQuery/PowerQueryLoadToTableTests.cs - Enhanced test assertions.
  • tests/ExcelMcp.Core.Tests/Integration/Commands/PowerQuery/PowerQueryCommandsTests.cs - Added QueryTable checks.

Testing

Good news, everyone! All tests are passing:

  • βœ… Bug_RefreshConnectionOnlyThenSetLoadToTable_CreatesRangeInsteadOfTable - 21s
  • βœ… Update_QueryColumnStructure_UpdatesWorksheetColumns - 22s

Related Questions

Now, a couple of lingering questions to ponder:

  1. Should UpdateAsync also remove usedRange.Clear()?

    • Currently, usedRange.Clear() is still hanging out in UpdateAsync.
    • Evidence suggests that removing it causes test failures (columns accumulate – yikes!).
    • The hunch is that it shouldn't be necessary, but...
    • ...it needs more digging.
  2. Why the different behavior between SetLoadToTableAsync and UpdateAsync?

    • SetLoadToTableAsync context: Connection-only, so there's no existing data.
    • UpdateAsync context: There's already a QueryTable with data.
    • Does QueryTable.Delete() behave differently depending on the context? That's the question.

Impact

This fix has some pretty sweet impacts:

  • LLM workflows: Connection-only β†’ Refresh β†’ LoadToTable now actually works correctly. Huzzah!
  • User workflows: No more duplicated columns or plain ranges messing up your spreadsheets.
  • Consistency: SetLoadToTableAsync now uses the same pattern as UpdateAsync (using RemoveQueryTablesFromSheet).

So, there you have it! A deep dive into a tricky bug and how we squashed it. Hopefully, this helps you avoid similar headaches in the future! Happy querying!