Polars: py-0.20.24 Release

Release date:
May 7, 2024
Previous version:
py-0.20.23 (released April 28, 2024)
Magnitude:
6,245 Diff Delta
Contributors:
22 total committers
Data confidence:
Commits:

69 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored April 29, 2024

Top Contributors in py-0.20.24

ritchie46
wence-
MarcoGorelli
stinodego
alexander-beedie
CanglongCl
itamarst
nameexhaustion
haocheng6
orlp

Directory Browser for py-0.20.24

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

[!WARNING]
This release was yanked. Please use the 0.20.25 release instead.

πŸ† Highlights

  • Support pytorch Tensor and Dataset export with new to_torch DataFrame/Series method (#15931)

πŸš€ Performance improvements

  • Don't traverse deep datasets that we repr as union in CSE (#16096)
  • Ensure better chunk sizes (#16071)

✨ Enhancements

  • split out rolling_*(..., by='foo') into rolling_*_by('foo', ...) (#16059)
  • add date pattern dd.mm.YYYY (#16045)
  • split Expr.top_k and Expr.top_k_by into separate functions (#16041)
  • Support non-coalescing joins in default engine (#16036)
  • Support pytorch Tensor and Dataset export with new to_torch DataFrame/Series method (#15931)
  • Minor DB type inference updates (#16030)
  • Move diagonal & horizontal concat schema resolving to IR phase (#16034)
  • raise more informative error messages in rolling_* aggregations instead of panicking (#15979)
  • Convert concat during IR conversion (#16016)
  • Improve dynamic supertypes (#16009)
  • Additional uint datatype support for the SQL interface (#15993)
  • Add post-optimization callback (#15972)
  • Support Decimal read from IPC (#15965)
  • Expose plan and expression nodes through NodeTraverser to Python (#15776)
  • Add typed collection from par iterators (#15961)
  • Add by argument for Expr.top_k and Expr.bottom_k (#15468)

🐞 Bug fixes

  • Respect user passed 'reader_schema' in 'scan_csv' (#16080)
  • Lazy csv + projection; respect null values arg (#16077)
  • Materialize dtypes when converting to arrow (#16074)
  • Fix casting decimal to decimal for high precision (#16049)
  • Fix Series constructor failure for Array types for large integers (#16050)
  • Fix printing max scale decimals (#16048)
  • Decimal supertype for dyn int (#16046)
  • Correctly handle large timedelta objects in Series constructor (#16043)
  • Do not close connection just because we're not returning Arrow data in batches (#16031)
  • properly handle nulls in DictionaryArray::iter_typed (#16013)
  • Fix CSE case where upper plan has no projection (#16011)
  • Crash/incorrect group_by/n_unique on categoricals created by (q)cut (#16006)
  • converting from numpy datetime64 and overriding dtype with a different resolution was returning incorrect results (#15994)
  • Ternary supertype dynamics (#15995)
  • Fix PartialEq for DataType::Unknown (#15992)
  • Finish adding typed_lit to help schema determination in SQL "extract" func (#15955)
  • Fix dtype parameter in pandas_to_pyseries function (#15948)
  • do not panic when comparing against categorical with incompatible dtype (#15857)
  • Join validation for multiple keys (#15947)
  • Add missing "truncate_ragged_lines" parameter to read_csv_batched (#15944)

πŸ“– Documentation

  • Ensure consistent docstring warning in fill_nan methods (pointing out that nan isn't null) (#16061)
  • add filter docstring examples to date and datetime (#15996)
  • Fix docstring mistake for polars.concat_str (#15937)
  • Update reference to apply (#15982)
  • Remove unwanted linebreaks from docstrings (#16002)
  • correct default in rolling_* function examples (#16000)
  • Improve user-guide doc of UDF (#15923)
  • update the link to R API docs (#15973)

πŸ› οΈ Other improvements

  • Bump sccache action (#16088)
  • Fix failures in test coverage workflow (#16083)
  • Update benchmarks/coverage jobs with "requirements-ci" (#16072)
  • Add TypeGuard to is_polars_dtype util (#16065)
  • Clean up hypothesis decimal strategy (#16056)
  • split Expr.top_k and Expr.top_k_by into separate functions (#16041)
  • Use UnionArgs for DSL side (#16017)
  • Add some comments (#16008)
  • Improve hypothesis strategy for decimals (#16001)
  • Set up TPC-H benchmark tests (#15908)
  • Even more Pyo3 0.21 Bound<> APIs (#15914)
  • Fix failing test (#15936)

Thank you to all our contributors for making this release possible! @CanglongCl, @JulianCologne, @KDruzhkin, @MarcoGorelli, @alexander-beedie, @avimallu, @bertiewooster, @c-peters, @dependabot, @dependabot[bot], @eitsupi, @haocheng6, @itamarst, @luke396, @marenwestermann, @nameexhaustion, @orlp, @ritchie46, @stinodego, @thalassemia, @wence- and @wsyxbcl