Polars: py-1.22.0 Release

Release date:
February 11, 2025
Previous version:
py-1.21.0 (released February 4, 2025)
Magnitude:
20,618 Diff Delta
Contributors:
24 total committers
Data confidence:
Commits:

121 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored February 6, 2025
Authored January 27, 2025
Authored January 31, 2025
Authored February 3, 2025
Authored January 24, 2025
Authored February 1, 2025
Authored February 6, 2025

Top Contributors in py-1.22.0

coastalwhite
orlp
nameexhaustion
ritchie46
lukemanley
alexander-beedie
etiennebacher
mcrumiller
c-peters
PrettyWood

Directory Browser for py-1.22.0

All files are compared to previous version, py-1.21.0. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸš€ Performance improvements

  • Reduce sharing in stringview arrays in new-streaming equijoin (#21129)
  • Implement native Expr.count() on new-streaming (#21126)
  • Speed up list operations that use amortized_iter() (#20964)
  • Use Cow as output for rechunk and add rechunk_mut (#21116)
  • Reduce arrow slice mmap overhead (#21113)
  • Reduce conversion cost in chunked string gather (#21112)
  • Enable prefiltered by default for new streaming (#21109)
  • Enable parquet column expressions for streaming (#21101)
  • Deduplicate buffers again in stringview concat kernel (#21098)
  • Add dedicated concatenate kernels (#21080)
  • Rechunk only once during join probe gather (#21072)
  • Micro-optimise internal DataFrame height and width checks (#21071)
  • Speed up from_pandas when converting frame with multi-index columns (#21063)
  • Change default memory prefetch to MADV_WILLNEED (#21056)
  • Remove cast to boolean after comparison in optimizer (#21022)
  • Split last rowgroup among all threads in new-streaming parquet reader (#21027)
  • Recombine into larger morsels in new-streaming join (#21008)
  • Improve list.min and list.max performance for logical types (#20972)
  • Ensure count query select minimal columns (#20923)

✨ Enhancements

  • Add projection pushdown to new streaming multiscan (#21139)
  • Implement join on struct dtype (#21093)
  • Use unique temporary directory path per user and restrict permissions (#21125)
  • Enable ingest of objects supporting the PyCapsule interface via from_arrow (#21128)
  • Enable new streaming multiscan for CSV (#21124)
  • Environment POLARS_MAX_CONCURRENT_SCANS in multiscan for new streaming (#21127)
  • Ensure AWS credential provider sources AWS_PROFILE from environment after deserialization (#21121)
  • Multi/Hive scans in new streaming engine (#21011)
  • Add linear_spaces (#20941)
  • IO plugins suppport lazy schema (#21079)
  • Add write_table() function to Unity catalog client (#21089)
  • Add is_object method to Polars DataType class (#21074)
  • Implement merge_sorted for binary (#21045)
  • Hold string cache in new streaming engine and fix row-encoding (#21039)
  • Add CredentialProviderAzure parameter to accept user-instantiated azure credential classes (#21047)
  • Expose unity catalog dataclasses and type aliases (#21046)
  • Support max/min method for Time dtype (#19815)
  • Implement a streaming merge sorted node (#20960)
  • Automatically use temporary credentials API for scanning Unity catalog tables (#21020)
  • Add negative slice support to new-streaming engine (#21001)
  • Allow for more RG skipping by rewriting expr in planner (#20828)
  • Rename catalog schema to namespace (#20993)
  • Add functionality to create and delete catalogs, tables and schemas to Unity catalog client (#20956)
  • Allow custom JSONEncoder for the json_normalize function, minor speedup (#20966)
  • Support passing aws_profile in storage_options (#20965)
  • Improved support for KeyboardInterrupts (#20961)
  • Make the available concat alignment strategies more generic (#20644)
  • Extract timezone info from python datetimes (#20822)
  • Add hint for POLARS_AUTO_USE_AZURE_STORAGE_ACCOUNT_KEY to error message (#20942)
  • Filter Parquet pages with ParquetColumnExpr (#20714)
  • Expose descending and nulls last in window order-by (#20919)

🐞 Bug fixes

  • Fix Expr.over applying scale incorrectly for Decimal types (#21140)
  • Fix IO plugin predicate with failed serialization (#21136)
  • Ensure lit handles datetimes with tzinfo that represents a fixed offset from UTC (#21003)
  • Correctly implement take_(opt_)chunked_unchecked for structs (#21134)
  • Restore printing backtraces on panics (#21131)
  • Use microseconds for Unity catalog datetime unit (#21122)
  • Fix incorrect output height for SQL SELECT COUNT(*) FROM (#21108)
  • Validate/coerce types for comparisons within join_where predicates (#21049)
  • Do not auto-init credential providers if credential fetch returns error (#21090)
  • Fix join_where incorrectly dropping transformations on RHS of equality expressions (#21067)
  • Quadratic allocations when loading nested Parquet column metadata (#21050)
  • Invalidate sortedness flag when sorting from pl.Categorical to pl.Categorical("lexical") (#21044)
  • Calling top_k on list type panics (#21043)
  • Fix rolling on empty DataFrame panicking (#21042)
  • Fix set_tbl_width_chars panicking with negative width (#20906)
  • Ensure write_excel recognises the Array dtype and writes it out as a string (#20994)
  • Fix merge_sorted producing incorrect results or panicking for some logical types (#21018)
  • Fix all-null list aggregations returning Null dtype (#20992)
  • Ensure scalar-only with_columns are broadcasted on new-streaming (#20983)
  • Improve SQL interface behaviour when INTERVAL is not a fixed duration (#20958)
  • Address minor regression for one-column DataFrame passed to is_in expressions (#20948)
  • Add Arrow Float16 conversion DataType (#20970)
  • Revert length check of patterns in str.extract_many() (#20953)
  • Add maintain order for flaky new-streaming test (#20954)
  • Allow for respawning of new streaming sinks (#20934)
  • Ensure Function name correctness in cse (#20929)
  • Don't consume c_stream as iterable (#20899)
  • Validate pl.Array shape argument types (#20915)
  • Fix from_numpy returning Null dtype for empty 1D numpy array (#20907)
  • Consider the original dtypes when selecting columns in write_excel function (#20909)
  • Handle boolean comparisons in Iceberg predicate pushdown (#18199)
  • Fix map_elements panicking with Decimal type (#20905)

πŸ“– Documentation

  • Replace pandas where with mask in Migrating -> Coming from Pandas (#21085)
  • Correct Arrow misconception (#21053)
  • Add example showing use of write_delta with delta_lake.WriterProperties (#20746)
  • Add missing shape param to Array docstring (#20747)
  • Add IO plugins to Python API reference (#21028)
  • Document IO plugins (#20982)
  • Ensure set_sorted description references single-column behavior (#20709)

πŸ“¦ Build system

  • Speed up CI by running a few more tests in parallel (#21057)

πŸ› οΈ Other improvements

  • Add test for equality filters in Parquet (#21114)
  • Add various tests for open issues (#21075)
  • Upgrade packages and apply latest formatting (#21086)
  • Move python dsl and builder_dsl code to dsl folder (#21077)
  • Organize python related logics in polars-plan (#21070)
  • Improve binary dispatch (#21061)
  • Skip physical order test (#21060)
  • Fix new ruff lints (#21040)
  • Added test to check for the computation of list.len for null (#20938)
  • Add make fix for running cargo clippy --fix (#21024)
  • Add tests for resolved issues (#20999)
  • Update code coverage workflow to use macos-latest runners (#20995)
  • Remove unused arrow file (#20974)
  • Deprecate the old streaming engine (#20949)
  • Move dt.replace tests to dedicated file, add "typing :: typed" classifier, remove unused testing function (#20945)
  • Extract merge sorted IR node (#20939)
  • Update copyright year (#20764)
  • Move Parquet deserialization to BitmapBuilder (#20896)
  • Also publish polars-python (#20933)
  • Remove verify_dict_indices_slice from main (#20928)
  • Add tests for already resolved issues (#20921)
  • Fix the verify_dict_indices codegen (#20920)
  • Add ProjectionContext in projection pushdown opt (#20918)

Thank you to all our contributors for making this release possible! @FBruzzesi, @MarcoGorelli, @aberres, @alexander-beedie, @arnabanimesh, @bschoenmaeckers, @coastalwhite, @deanm0000, @dependabot[bot], @dimfeld, @eitsupi, @etiennebacher, @henryharbeck, @itamarst, @lmmx, @lukemanley, @mcrumiller, @mullimanko, @nameexhaustion, @orlp, @petrosbar, @ritchie46, @siddharth-vi, @skritsotalakis, @taureandyernv and dependabot[bot]