Polars: py-1.10.0 Release

Release date:
October 20, 2024
Previous version:
py-1.9.0 (released October 1, 2024)
Magnitude:
16,079 Diff Delta
Contributors:
29 total committers
Data confidence:
Commits:

115 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored October 17, 2024
Authored October 8, 2024
Authored October 13, 2024

Top Contributors in py-1.10.0

coastalwhite
nameexhaustion
orlp
rodrigogiraoserrao
ritchie46
pomo-mondreganto
MarcoGorelli
alexander-beedie
alonme
stinodego

Directory Browser for py-1.10.0

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

πŸš€ Performance improvements

  • Add/fix unordered row decode, change unordered format (#19284)
  • Fast decision for Parquet dictionary encoding (#19256)
  • Make date_range / datetime_range ~10x faster for constant durations (#19216)
  • Batch utf8-validation in csv 18% / 25% on 1.9.0 (#19124)
  • Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)

✨ Enhancements

  • Add SQL support for bit_count and bitwise &, |, and xor operators (#19114)
  • Add credential provider utility classes for AWS, GCP (#19297)
  • Support decoding Float16 in Parquet (#19278)
  • Experimental credential_provider argument for scan_parquet (#19271)
  • Allow DeltaTable input to scan_delta and read_delta (#19229)
  • New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
  • Conserve Parquet SortingColumns for ints (#19251)
  • Low level flight interface (#19239)
  • Improved list arithmetic support (#19162)
  • Add Expr.struct.unnest() as alias for Expr.struct.field("*") (#19212)
  • Add 'drop_empty_rows' parameter for read_ods (#19202)
  • Add 'drop_empty_rows' parameter for read_excel (#18253)
  • Expose LTS CPU in show_versions() (#19193)
  • Check Python version when deserializing UDFs (#19175)
  • Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149)
  • Quantile function in SQL (#18047)
  • Improve scalar strict message (#19117)
  • Add Series::{first, last, approx_n_unique} (#19093)
  • Allow for rolling_*_by to use index count as window (#19071)
  • Delay deserialization of python function until physical plan (#19069)
  • Add cum(_min/_max) for pl.Boolean (#19061)

🐞 Bug fixes

  • Don't produce duplicate column names in Series.to_dummies (#19326)
  • Use of HAVING outside of GROUP BY should raise a suitable SQLSyntaxError (#19320)
  • More accurate from_dicts typing/signature (#19322)
  • Fix empty array gather (#19316)
  • Merge categorical rev-map in unpivot (#19313)
  • DataFrame descending sorting by single list element (#19233)
  • Fix cse union schema (#19305)
  • Correctly load Parquet statistics for f16 (#19296)
  • Error on invalid query (#19303)
  • Fix enum scalar output (#19301)
  • Fix list gather invalid fast path (#19299)
  • Fix quoting style of decimal csv output (#19298)
  • Don't vertically parallelize literal select (#19295)
  • Fix struct reshape fast path (#19294)
  • Also split on forward slashes during hive path inference on Windows (#19282)
  • Don't cse as_struct (#19280)
  • Only apply string parsing to String dtype (#19222)
  • Make the SQLAlchemy connection check more robust (#19270)
  • Ensure that read_database takes advantage of Arrow return from a duckdb_engine connection when using a SQLAlchemy Selectable (#19255)
  • Compilation error missing use JsonLineReader (#19244)
  • Don't remember Parquet statistics if filtered (#19248)
  • Do not check dtypes of non-projected columns for parquet (#19254)
  • Parquet predicate pushdown for lit(_) != (#19246)
  • Use all chunks in Series from arrow struct (#19218)
  • Don't trigger row limit in array construction (#19215)
  • Fix struct literals (#19214)
  • Plotting was not interacting well with Altair schema wrappers (#19213)
  • Fixing infer_schema for DataType::Null (#19201)
  • Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
  • Add 'drop_empty_rows' parameter for read_excel (#18253)
  • Don't unwrap() expansion (#19196)
  • Properly handle non-nullable nested Parquet (#19192)
  • Fix invalid list collection in expression engine (#19191)
  • Fix use of "hidden_columns" parameter in write_excel (#19029)
  • Implement to_arrow functionality properly for Arrays (#19077)
  • Remove incorrect warning when using an IO[bytes] instance (#19154)
  • Don't fail test if e.g. jax has been used first, since jax installs a fork handler that warns (#19178)
  • Fix incorrect (eq|ne)_missing on List/Array types (#19155)
  • Properly broadcast Struct when then validity (#19148)
  • Allow partial name overlap in join_where resolution (#19128)
  • Fix floordiv / modulo with scalar 0 on LHS (#19143)
  • Ensure aligned chunks in OOC sort (#19118)
  • Recursively align when converting to ArrowArray (#19097)
  • Raise on invalid shape of shape 1, empty combination (#19113)
  • Use two-pass algorithm for csv to ensure correctness and SIMDize more ~17% (#19088)
  • Allow converting DatetimeOwned to ChunkedArray (#19094)
  • Throw proper error for empty char params in scan_csv (#19100)
  • Ensure parquet schema arg is propagated to IR (#19084)
  • Only rewrite numeric ineq joins (#19083)
  • Check validity of columns of keys/aggs in dsl->ir (#19082)
  • Bitwise aggregations should ignore null values (#19067)
  • Remove failing datetime subclass test (#19068)
  • Don't ignore multiple columns in LazyFrame.unnest (#19035)

πŸ“– Documentation

  • Remove ecosystem viz section since there is one in misc already (#18408)
  • Fix typo in custom expressions docs (#19292)
  • Add SQL docs for new QUANTILE_CONT and QUANTILE_DISC functions (#19272)
  • Add marimo to ecosystem.md (#19250)
  • Improve DataFrame.write_database docstring (#19189)
  • Link to main website from banner (#19177)
  • Fix example of as_struct (#19116)
  • Clarify difference between bitwise/logical ops (#19180)
  • Add non-equi joins to, and revise, joins docs page (#19127)
  • Add Series.first,last,approx_n_unique to docs (#19146)
  • Annotate Config kwarg options (#18988)
  • Revise and improve 'Concepts' section (#19087)

πŸ› οΈ Other improvements

  • Add/fix unordered row decode, change unordered format (#19284)
  • Move from parquet-format-safe to polars-parquet-format (#19275)
  • Skip flaky test (#19242)
  • Add more tests for list arithmetic (#19225)
  • Remove unused IPC async (#19223)
  • Make get_list_builder infallible (#19217)
  • Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
  • Make expression output type known (#19195)
  • Revert "feat(python): Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149) (#19188)
  • Zero-Field Structs and DataFrame with Height Property (#19123)
  • Make pl.repeat part of the IR (#19152)
  • Expose IEJoin IR node to python (#19104)
  • Clean remove_prefix since python3.9 is now the minimum Python (#19070)
  • Add new streaming engine to CI (#19051)

Thank you to all our contributors for making this release possible! @Bidek56, @MarcoGorelli, @Rashik-raj, @adamreeve, @alexander-beedie, @alonme, @balbok0, @coastalwhite, @deanm0000, @dependabot, @dependabot[bot], @eitsupi, @etrotta, @itamarst, @jbutterwick, @joelostblom, @kenkoooo, @khalidmammadov, @laurentS, @mcrumiller, @mscolnick, @nameexhaustion, @orlp, @pomo-mondreganto, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego, @sunadase and @wence-