Skip to content

How to make transfer data faster while using databricks-sql-go #235

@calebeaires

Description

@calebeaires

I am using the driver to run a data migration. When dealing with a table of 24 million rows and 9 columns, the performance is excellent when I fetch 10 thousand rows. When I increase the fetch size to 100 thousand rows, the transfer speed is still good. However, when fetching 1 million rows or more, the data transfer becomes very slow. A quick test made me think that the driver tries to fetch all the data in a single batch. Is there a way to improve this process?

This is the connection string

"token:xxx@host:443$xxx-path?catalog=sample&database=big_table&useCloudFetch=true&maxRows=10000"

I've tried to use this two settings, data transfer is still slow to big tables.

  • useCloudFetch=true
  • maxRows=10000

In forums users suggest change spark.driver.maxResultSize ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions