Polars: py-1.20.0 Release

Release date:
February 13, 2025
Previous version:
py-1.19.0 (released January 10, 2025)
Magnitude:
16,367 Diff Delta
Contributors:
19 total committers
Data confidence:
Commits:

147 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored January 6, 2025
Authored January 12, 2025
Authored January 15, 2025
Authored January 10, 2025
Authored January 6, 2025
Authored January 13, 2025
Authored January 7, 2025
Authored January 8, 2025

Top Contributors in py-1.20.0

coastalwhite
orlp
nameexhaustion
ritchie46
alexander-beedie
itamarst
etiennebacher
mcrumiller
lukemanley
kjgoodrick

Directory Browser for py-1.20.0

All files are compared to previous version, py-1.19.0. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

⚠️ Deprecations

  • Make parameter of str.to_decimal keyword-only (#20570)

πŸš€ Performance improvements

  • Extend functionality on BitmapBuilder and use in Growables (#20754)
  • Specialize first/last agg for simple types in new-streaming engine (#20728)
  • Use PyO3 to convert between Python and Rust datetimes (#20660)
  • Improve state caching and parallelism of window functions (#20689)
  • Broadcast without materialization in concat_arr (#20681)
  • Cache rolling groups (#20675)
  • Use downcast_ref instead of dtype equality in <dyn SeriesTrait as AsRef<ChunkedArray<T>> (#20664)
  • Fix performance regression for DataFrame serialization/pickling (#20641)
  • Make Parquet verify_dict_indices SIMD (#20623)
  • Move to zlib-rs by default and use zstd::with_buffer (#20614)
  • Skip filter expansion in eager (#20586)
  • Improve unique pred-pd (#20569)

✨ Enhancements

  • Allow different python versions for pickle (#20740)
  • Add SQL support for the NORMALIZE string function (#20705)
  • Add 'allow_exact_matches' join_asof' (#20723)
  • Add new-streaming first/last aggregations (#20716)
  • Add Parquet Sink to new streaming engine (#20690)
  • Make automatic use of Azure storage account keys opt-in (#20652)
  • Reduce scan_csv() (and friends') memory usage when using BytesIO (#20649)
  • Improve GroupsProxy/GroupsPosition to be sliceable and cheaply cloneable (#20673)
  • Add str.normalize() (#20483)
  • Allow more group_by agg expressions in the new streaming engine (#20663)
  • Support loading Excel Table objects by name (#20654)
  • Support writing to file objects from write_excel (#20638)
  • Raise DuplicateError if given a pyarrow Table object with duplicate column names (#20624)
  • Support writing partitioned parquet to cloud (#20590)
  • Add hint to error message for extra struct field in JSON (#20612)
  • Add index_of() function to Series and Expr (#19894)
  • Update sqlparser-rs, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries (#20576)
  • Add cat.starts_with/cat.ends_with (#20257)

🐞 Bug fixes

  • Avoid blocking on async runtime when resolving cloud scans (#20750)
  • Fix allow_invalid_certificates being ignored in storage_options (#20744)
  • Incorrect output type for map_groups returning all-NULL column (#20743)
  • Fix unique(maintain_order=True) raising InvalidOperationError for null array (#20737)
  • Don't collapse into a Nested Loop Join if the cross join maintains order (#20729)
  • Don't serialize credentials provider (#20741)
  • Fix Series.n_unique raising for list of struct (#20724)
  • Fix incorrect top-k by sorted column, fix head() returning extra rows (#20722)
  • Add outer validity to AnyValueBufferTrusted for structs (#20713)
  • Don't partition group-by with non-scalar literals in agg (#20704)
  • Fix xor operation of selector with Expr (#20702)
  • Incorrect view buffer dedup (#20691)
  • Only verify Parquet ConvertedType if no LogicalType is given (#20682)
  • Validate length of schema_overrides in read_csv (#20672)
  • Fix map_elements ignoring skip_nulls=True for struct dtype (#20668)
  • Check for MAP-GROUPS in cloud-eligible (#20662)
  • Fix empty output of to_arrow() on filtered unit height DataFrame (#20656)
  • Add .default to azure credential provider scope URL (#20651)
  • Fix join_asof panicking for invalid tolerance input (#20643)
  • Incorrect flag check on is_elementwise (#20646)
  • Don't panic but set null type if type is unknown (#20647)
  • Fix performance regression for DataFrame serialization/pickling (#20641)
  • Fix Int128 dtype serialization (#20629)
  • Ensure read_excel and read_ods support reading from raw bytes for all engines (#20636)
  • Ensure that SQL LIKE and ILIKE operators support multi-line matches (#20613)
  • Properly broadcast in sort_by (#20434)
  • Properly load nested Parquet Statistics (#20610)
  • AWS environment config was not loaded when credential provider was used (#20611)
  • Fix order observability of group-by-dyn (#20615)
  • Soundness when loading Parquet string statistics (#20585)
  • Fix error filtering after with_columns() on unit height LazyFrame (#20584)
  • Propagate tenant_id to CredentialProviderAzure if given (#20583)
  • Restore symbols on Apple by bumping nightly version (#20563)
  • Fix type annotation of str.strip_chars_* methods (#20565)
  • Fix variable name in error message for "unsupported data type" in rolling and upsampling operations (#20553)

πŸ“– Documentation

  • Add more information for cross joins (#20753)
  • Fix typo in sql functions (cosinus -> cosine) (#20676)
  • Add links to read_excel "engine_options" and "read_options" docstring (#20661)
  • Fix small typo in plugins (polars-dt -> polars-st) (#20657)
  • Add polars-h3 and polars-st to plugin list (#20653)
  • Add docs reference for Field (#20625)
  • Update DataFrame join examples (#20587)
  • Miscellaneous minor updates/fixes (#20573)
  • Update "group_by_rolling" (deprecated) to "rolling" in user guide (#20548)

πŸ“¦ Build system

  • Update to official release of PyO3 0.23.4 (#20683)
  • Officially support Python 3.13 (#20549)

πŸ› οΈ Other improvements

  • Fix remote benchmark script (#20755)
  • Fix tests (#20745)
  • Simplify hive predicate handling in NEW_MULTIFILE (#20730)
  • Add tests for various open issues (#20720)
  • Fixes an Excel test following new fastexcel release (#20703)
  • Add tests for various open issues that have been fixed (#20680)
  • Don't include debug symbols in benchmark run (#20571)
  • Implement CSV, IPC and NDJson in the MultiScanExec node (#20648)
  • Don't rely on argument order of optimization_toggle (#20622)
  • Fix Python deps installation in remote-benchmark workflow (#20619)
  • Fix flaky categorical test (#20591)
  • Bump multiversion from 0.7 to 0.8 (#20543)
  • Remove unused nested function in LazyFrame.fill_null (#20558)
  • Improve bin size info (#20551)

Thank you to all our contributors for making this release possible! @Jesse-Bakker, @MarcoGorelli, @MoizesCBF, @SamuelAllain, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @eitsupi, @etiennebacher, @itamarst, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego