Polars: py-0.20.0 Release

Release date:
December 16, 2023
Previous version:
py-0.19.19 (released December 1, 2023)
Magnitude:
11,475 Diff Delta
Contributors:
21 total committers
Data confidence:
Commits:

115 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored December 12, 2023
Authored December 11, 2023

Top Contributors in py-0.20.0

stinodego
orlp
ritchie46
c-peters
nameexhaustion
MarcoGorelli
mcrumiller
ion-elgreco
alexander-beedie
rob-sil

Directory Browser for py-0.20.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

This version includes quite a few breaking changes. We are preparing for the 1.0 release and aim to make the upgrade from 0.20 to 1.0 as smooth as possible. Therefore, we prioritized getting any breaking changes in now rather than with 1.0.

Check out the upgrade guide for help navigating the upgrade to this version.

Please bear with us while we continue to make Polars the best tool it can be!

πŸ† Highlights

  • Add new Enum categorical data type which allows a fixed set of categories (#11822)

πŸ’₯ Breaking changes

  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Reimplement replace expression on the Rust side (#13002)
  • Preserve left and right join keys in outer joins (#12963)
  • Update update signature (#12986)
  • Update Expr.count to ignore null values by default (#12934)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Allow all DataType objects to be instantiated (#12470)
  • Change value_counts resulting column name from counts to count (#12506)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Default to exact checking for integers in assertion utils (#12331)
  • Set default dtype for Series to Null when no data is present (#12807)
  • Update lit behavior for list/tuple inputs (#12559)
  • Change DataType.is_nested from property to classmethod (#12453)
  • Update constructors for Array and Decimal (#12837)
  • Smaller integer data types for datetime components (#12070)
  • Fix NaN ordering to make NaNs compare greater than any other float, and equal to themselves (#12721)

⚠️ Deprecations

  • Rename write_database parameter if_exists to if_table_exists (#12783)

πŸš€ Performance improvements

  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Elide allocation in outer join materialization (#12992)
  • Avoid dispatching Series.head/tail to the expression engine (#12946)
  • Ensure we reduce for any/all_horizontal (#12976)
  • Add fast paths for UTC in truncate (#12965)
  • Use select_seq for expression dispatch (#12962)
  • Improve rolling_median algorithm (#12704)
  • Use fast path for non-null data in new SQL-like null matching (#12874)
  • Optimize DataFrame.iter_rows for smaller buffer sizes (#12804)
  • Speed up initializing Series from a list of NumPy arrays (#12785)

✨ Enhancements

  • Add str.contains_any and str.replace_many (Aho-Corasick algorithms) (#13073)
  • Auto-infer credentials from .aws folder (#13062)
  • Support private cloud S3 storage in scan_parquet (#13060)
  • Use Object Store instead of fsspec for read_parquet (#13044)
  • Avoid dispatching to expression engine for various Series methods (#13010)
  • Allow order operators (<,>,>=,<=) on Enum types (#12982)
  • Reimplement replace expression on the Rust side (#13002)
  • Expand set of NumPy functions which emit inefficient map_* warning (#13039)
  • Use tokio semaphore for concurrency handling (#13026)
  • Improve and expressify hist (#13014)
  • Update describe to use new count implementation (#12990)
  • Add default to_struct Series name consistent with the usual default Series name (empty string) (#12998)
  • Preserve left and right join keys in outer joins (#12963)
  • Clarify "inefficient map_elements" warning message (#12978)
  • Allow end before start in date/time_range (#12964)
  • Update update signature (#12986)
  • Minor update to Array data type repr (#12973)
  • Implement group-tuples for Null dtype (#12975)
  • Cast to an enum from int (#12954)
  • Move categorical ordering into dtype (#12911)
  • Avoid importing interchange module by default (#12927)
  • Update Expr.count to ignore null values by default (#12934)
  • Raise if expression passed as scalar to DataFrame constructor (#12916)
  • Update repr of Struct data type class (#12922)
  • Enable partial predicate pushdown past window expressions (#12710)
  • Add merge mode to write_delta and remove pyarrow to delta conversions (#12392)
  • Add str.reverse (#12878)
  • Allow all DataType objects to be instantiated (#12470)
  • Specific performance warnings from Rust to Python (#12802)
  • Change value_counts resulting column name from counts to count (#12506)
  • Implement std and var for Duration columns (#12865)
  • Change default join behavior with regard to nulls, add join_nulls parameter to keep existing behavior (#12840)
  • Enhance write_database return (indicate the number of rows affected by the operation) (#12830)
  • Add dedicated Decimal selector (#12852)
  • Preserve base dtype when raising to UInt power (#10446)
  • Default to exact checking for integers in assertion utils (#12331)
  • Improve __repr__ implementation for Expr (#12770)
  • Support SQL subqueries for JOIN and FROM (#12819)

🐞 Bug fixes

  • Fix off-by-one error in quantile(method="nearest") (#13058)
  • Fix incorrect schema inference on nested columns (#13057)
  • Don't raise for datetime_range if starting on ambiguous datetime and earliest was specified (#13050)
  • Parse json_decode per max buffer length (#13029)
  • Parse 00:00 time zone as UTC (#13034)
  • Fix timeout errors in concurrent downloads (#13023)
  • Streamline align_frames and fix edge-case where the identical frame object appears more than once (#13007)
  • Fix SQL substring indexing (#13016)
  • Allow broadcasting in ranges (#11900)
  • Prevent deadlock in sink_csv (#12991)
  • Don't get mutable if buffer is sliced (#12979)
  • Support parameterized read_database calls against cursors that only take positional args (#12967)
  • Fix truncate when truncating by multiple weeks (#12948)
  • Fix segfault / memory corruption after plugins return Err result (#12953)
  • Raise a proper python typed exception when IO writers try to write to an non existent folder (#12936)
  • Don't panic when ambiguous parameter is not Utf8 (#12913)
  • Raise a proper python typed exception when the CSV writer tries to write to an non existent folder (#12919)
  • Patch rolling_var/rolling_std numerical stability (#12909)
  • Fix incorrect Int16 min/max due to incorrect SIMD mask construction (#12908)
  • Improve handling of decimal conversion with to_numpy in the absence of pyarrow (#12888)
  • Fix OOB error in list set operations on empty frame (#12845)
  • Fix error message for uninstantiated Enum types (#12886)
  • Fix repr of Expr.gather (which was still showing deprecated take) (#12864)
  • Fix Array dtype equality (#12853)
  • Fix nan_min/max incorrectly aggregating chunks with addition (#12848)
  • Revert type hint change on expression inputs (#12792)
  • More accurate type hinting for collect_all functions (#12796)
  • Use total float ordering in is_in (#12800)
  • Handle aggregation for all-NaN groups in group_by (#12304)

πŸ› οΈ Other improvements

  • Update version switcher for 0.20 (#12844)
  • Add upgrade guide for Python Polars 0.20 (#12872)
  • Run doctests before other tests (#13047)
  • Update describe calculation of min/max (#13027)
  • Minor typo fix (#13003)
  • Resolve two interchange tests failing locally (#12999)
  • Update outdated links to API in Expressions/Functions page (#12981)
  • Expand docstrings for count (#12960)
  • Fix issue with docs for group_by_dynamic (#12906)
  • Prefer explicit --no-cov flag for py3.12/ubuntu test workflow (vs implicit/omitted) (#12889)
  • Scheduled removal of previously deprecated functionality (#12885)
  • Fix references in deprecation notes (#12877)
  • Fix typo in hash docstring (#12879)
  • Fix docstring for deprecated list.take (#12873)
  • Note that list.take is deprecated (#12867)
  • Fix failing tests (#12859)
  • Add quotes to pip install with dependencies (#12799)
  • Fix parameter name reference in update docstring #12797

Thank you to all our contributors for making this release possible! @MarcoGorelli, @Object905, @Yerachmiel-Feltzman, @alexander-beedie, @c-peters, @ion-elgreco, @jankislinger, @mcrumiller, @nameexhaustion, @oli-clive-griffin, @orlp, @rancomp, @ritchie46, @romanovacca, @stinodego and @xuestrange