Polars: py-1.25.2 Release

Release date:
March 15, 2025
Previous version:
py-1.24.0 (released March 3, 2025)
Magnitude:
23,474 Diff Delta
Contributors:
19 total committers
Data confidence:
Commits:

130 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored March 14, 2025
Authored March 7, 2025
Authored March 11, 2025
Authored March 12, 2025
Authored March 7, 2025
Authored March 7, 2025
Authored March 11, 2025
Authored March 5, 2025
Authored March 7, 2025
Authored March 7, 2025
Authored March 5, 2025

Top Contributors in py-1.25.2

coastalwhite
nameexhaustion
orlp
ritchie46
erikbrinkman
wence-
Matt711
itamarst
lukemanley
mcrumiller

Directory Browser for py-1.25.2

All files are compared to previous version, py-1.24.0. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ† Highlights

  • Enable common subplan elimination across plans in collect_all (#21747)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Enable new streaming memory sinks by default (#21589)

πŸš€ Performance improvements

  • Implement linear-time rolling_min/max (#21770)
  • Improve InputIndependentSelect by delegating to InMemorySourceNode (#21767)
  • Enable common subplan elimination across plans in collect_all (#21747)
  • Allow elementwise functions in recursive lowering (#21653)
  • Add primitive single-key hashtable to new-streaming join (#21712)
  • Remove unnecessary black_boxes in Kahan summation (#21679)
  • Box large enum variants (#21657)
  • Improve join performance for new-streaming engine (#21620)
  • Pre-fill caches (#21646)
  • Optimize only a single cache input (#21644)
  • Collect parquet statistics in one contiguous buffer (#21632)
  • Update Cargo.lock (mainly for zstd 1.5.7) (#21612)
  • Don't maintain order when maintain_order=False in new streaming sinks (#21586)
  • Pre-sort groups in group-by-dynamic (#21569)

✨ Enhancements

  • Add support for rolling_(sum/min/max) for booleans through casting (#21748)
  • Support multi-column sort for all nested types and nested search-sorted (#21743)
  • Add lazy sinks (#21733)
  • Add PartitionByKey for new streaming sinks (#21689)
  • Fix replace flags (#21731)
  • Add mkdir flag to sinks (#21717)
  • Enable joins on list/array dtypes (#21687)
  • Add a config option to specify the default engine to attempt to use during lazyframe calls (#20717)
  • Support all elementwise functions in IO plugin predicates (#21705)
  • Stabilize Enum datatype (#21686)
  • Support Polars int128 in from arrow (#21688)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Enable new streaming memory sinks by default (#21589)
  • Cloud support for new-streaming scans and sinks (#21621)
  • Add len method to arr (#21618)
  • Closeable files on unix (#21588)
  • Add new PartitionMaxSize sink (#21573)
  • Support engine callback for LazyFrame.profile (#21534)
  • Dispatch new-streaming CSV negative slice to separate node (#21579)
  • Add NDJSON source to new streaming engine (#21562)
  • Support passing token in storage_options for GCP cloud (#21560)

🐞 Bug fixes

  • Expose and document partitions (#21765)
  • Fix lazy schema for truediv ops involving List/Array dtypes (#21764)
  • Fix error due to race condition in file cache (#21753)
  • Clear NaNs due to zero-weight division in rolling var/std (#21761)
  • Allow init from BigQuery Arrow data containing ExtensionType cols with irrelevant metadata (#21492)
  • Disallow cast from boolean to categorical/enum (#21714)
  • Don't check sortedness in join_asof when 'by' groups supplied, but issue warning (#21724)
  • Incorrect multithread path taken for aggregations (#21727)
  • Disallow cast to empty Enum (#21715)
  • Fix list.mean and list.median returning Float64 for temporal types (#21144)
  • Incorrect (FixedSize)ListArrayBuilder gather implementation (#21716)
  • Always fallback in SkipBatchPredicate (#21711)
  • New streaming multiscan deadlock (#21694)
  • Ensure new-streaming join BuildState is correct even if never fed morsels (#21708)
  • IO plugin; support empty iterator (#21704)
  • Support nulls in multi-column sort (#21702)
  • Window function check length of groups state (#21697)
  • Support 128 sum reduction on new streaming (#21691)
  • IPC round-trip of list of empty view with non-empty bufferset (#21671)
  • Variance can never be negative (#21678)
  • Incorrect loop length in new-streaming group by (#21670)
  • Right join on multiple columns not coalescing left_on columns (#21669)
  • Casting Struct to String panics if n_chunks > 1 (#21656)
  • FixFuture attached to different loop error on read_database_uri (#21641)
  • Fix deadlock in cache + hconcat (#21640)
  • Properly handle phase transitions in row-wise sinks (#21600)
  • Enable new streaming memory sinks by default (#21589)
  • Always use global registry for object (#21622)
  • Check enum categories when reading csv (#21619)
  • Unspecialized prefiltering on nullable arrays (#21611)
  • Release the gil on explain (#21607)
  • Take into account scalar/partitioned columns in DataFrame::split_chunks (#21606)
  • Bad null handling in unordered row encoding (#21603)
  • Fix deadlock in new streaming CSV / NDJSON sinks (#21598)
  • Bad view index in BinaryViewBuilder (#21590)
  • Fix CSV count with comment prefix skipped empty lines (#21577)
  • New streaming IPC enum scan (#21570)
  • Several aspects related to ParquetColumnExpr (#21563)
  • Don't hit parquet::pre-filtered in case of pre-slice (#21565)

πŸ“– Documentation

  • Add skrub to ecosystem.md (#21760)
  • Add example for percentile rank (#21746)
  • Make python/rust getting-started consistent and clarify performance risk of infer_schema_length=None (#21734)
  • Add expression composability to PySpark comparison (#21473)
  • Document read_().lazy() antipattern (#21623)
  • Update Polars Cloud interactive workflow examples (#21609)
  • Add a Plotnine example to the visualization docs (#21597)
  • Add cloud api reference to Ref guide (#21566)

πŸ› οΈ Other improvements

  • Remove variance numerical stability hack (#21749)
  • Only use chrono_tz timezones in hypothesis testing (#21721)
  • Remove order check from flaky test (#21730)
  • Add sinks into the DSL before optimization (#21713)
  • Add missing test case for #21701 (#21709)
  • Remove old-streaming from engine argument (#21667)
  • Add as_phys_any to PrivateSeries for downcasting (#21696)
  • Use FFI to read dataframe instead of transmute (#21673)
  • Work around typos ignore bug (#21672)
  • Added Test For datetime_range Nanosecond Overflow (#21354)
  • Update to edition 2024 (#21662)
  • Update rustc (#21647)
  • Support object from chunks (#21636)
  • Push versioned docs on workflow dispatch (#21630)
  • Fail docs early (#21629)
  • Check major/minor in docs (#21626)
  • Add docs workflow (#21624)
  • Add test for 21581 (#21617)
  • Remove even more parquet multiscan handling (#21601)
  • Remove multiscan handling from new streaming parquet source (#21584)
  • Prepare skeleton for partitioning sinks (#21536)

Thank you to all our contributors for making this release possible! @GaelVaroquaux, @Kevin-Patyk, @MarcoGorelli, @Matt711, @NathanHu725, @alexander-beedie, @coastalwhite, @dependabot[bot], @jrycw, @kdn36, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @r-brink, @ritchie46, @wence- and dependabot[bot]