π Performance improvements
- Add/fix unordered row decode, change unordered format (#19284)
- Fast decision for Parquet dictionary encoding (#19256)
- Make date_range / datetime_range ~10x faster for constant durations (#19216)
- Batch utf8-validation in csv
18%
/ 25%
on 1.9.0 (#19124)
- Use two-pass algorithm for csv to ensure correctness and SIMDize more
~17%
(#19088)
β¨ Enhancements
- Add SQL support for
bit_count
and bitwise &
, |
, and xor
operators (#19114)
- Add credential provider utility classes for AWS, GCP (#19297)
- Support decoding Float16 in Parquet (#19278)
- Experimental
credential_provider
argument for scan_parquet
(#19271)
- Allow DeltaTable input to scan_delta and read_delta (#19229)
- New quantile interpolation method & QUANTILE_DISC function in SQL (#19139)
- Conserve Parquet
SortingColumns
for ints (#19251)
- Low level flight interface (#19239)
- Improved list arithmetic support (#19162)
- Add Expr.struct.unnest() as alias for Expr.struct.field("*") (#19212)
- Add 'drop_empty_rows' parameter for
read_ods
(#19202)
- Add 'drop_empty_rows' parameter for
read_excel
(#18253)
- Expose LTS CPU in show_versions() (#19193)
- Check Python version when deserializing UDFs (#19175)
- Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149)
- Quantile function in SQL (#18047)
- Improve scalar strict message (#19117)
- Add Series::{first, last, approx_n_unique} (#19093)
- Allow for rolling_*_by to use index count as window (#19071)
- Delay deserialization of python function until physical plan (#19069)
- Add cum(_min/_max) for pl.Boolean (#19061)
π Bug fixes
- Don't produce duplicate column names in Series.to_dummies (#19326)
- Use of
HAVING
outside of GROUP BY
should raise a suitable SQLSyntaxError (#19320)
- More accurate
from_dicts
typing/signature (#19322)
- Fix empty array gather (#19316)
- Merge categorical rev-map in
unpivot
(#19313)
- DataFrame descending sorting by single list element (#19233)
- Fix cse union schema (#19305)
- Correctly load Parquet statistics for f16 (#19296)
- Error on invalid query (#19303)
- Fix enum scalar output (#19301)
- Fix list gather invalid fast path (#19299)
- Fix quoting style of decimal csv output (#19298)
- Don't vertically parallelize literal select (#19295)
- Fix struct reshape fast path (#19294)
- Also split on forward slashes during hive path inference on Windows (#19282)
- Don't cse
as_struct
(#19280)
- Only apply string parsing to String dtype (#19222)
- Make the SQLAlchemy connection check more robust (#19270)
- Ensure that
read_database
takes advantage of Arrow return from a duckdb_engine
connection when using a SQLAlchemy Selectable
(#19255)
- Compilation error missing use JsonLineReader (#19244)
- Don't remember Parquet statistics if filtered (#19248)
- Do not check dtypes of non-projected columns for parquet (#19254)
- Parquet predicate pushdown for
lit(_) !=
(#19246)
- Use all chunks in
Series
from arrow struct (#19218)
- Don't trigger row limit in array construction (#19215)
- Fix struct literals (#19214)
- Plotting was not interacting well with Altair schema wrappers (#19213)
- Fixing infer_schema for DataType::Null (#19201)
- Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
- Add 'drop_empty_rows' parameter for
read_excel
(#18253)
- Don't unwrap() expansion (#19196)
- Properly handle non-nullable nested Parquet (#19192)
- Fix invalid list collection in expression engine (#19191)
- Fix use of "hidden_columns" parameter in
write_excel
(#19029)
- Implement to_arrow functionality properly for Arrays (#19077)
- Remove incorrect warning when using an
IO[bytes]
instance (#19154)
- Don't fail test if e.g. jax has been used first, since jax installs a fork handler that warns (#19178)
- Fix incorrect
(eq|ne)_missing
on List/Array types (#19155)
- Properly broadcast Struct when then validity (#19148)
- Allow partial name overlap in join_where resolution (#19128)
- Fix floordiv / modulo with scalar 0 on LHS (#19143)
- Ensure aligned chunks in OOC sort (#19118)
- Recursively align when converting to ArrowArray (#19097)
- Raise on invalid shape of shape 1, empty combination (#19113)
- Use two-pass algorithm for csv to ensure correctness and SIMDize more
~17%
(#19088)
- Allow converting
DatetimeOwned
to ChunkedArray
(#19094)
- Throw proper error for empty char params in scan_csv (#19100)
- Ensure parquet
schema
arg is propagated to IR (#19084)
- Only rewrite numeric ineq joins (#19083)
- Check validity of columns of keys/aggs in dsl->ir (#19082)
- Bitwise aggregations should ignore null values (#19067)
- Remove failing datetime subclass test (#19068)
- Don't ignore multiple columns in LazyFrame.unnest (#19035)
π Documentation
- Remove ecosystem viz section since there is one in misc already (#18408)
- Fix typo in custom expressions docs (#19292)
- Add SQL docs for new
QUANTILE_CONT
and QUANTILE_DISC
functions (#19272)
- Add marimo to ecosystem.md (#19250)
- Improve DataFrame.write_database docstring (#19189)
- Link to main website from banner (#19177)
- Fix example of
as_struct
(#19116)
- Clarify difference between bitwise/logical ops (#19180)
- Add non-equi joins to, and revise, joins docs page (#19127)
- Add
Series.first,last,approx_n_unique
to docs (#19146)
- Annotate Config kwarg options (#18988)
- Revise and improve 'Concepts' section (#19087)
π οΈ Other improvements
- Add/fix unordered row decode, change unordered format (#19284)
- Move from
parquet-format-safe
to polars-parquet-format
(#19275)
- Skip flaky test (#19242)
- Add more tests for list arithmetic (#19225)
- Remove unused IPC async (#19223)
- Make
get_list_builder
infallible (#19217)
- Migrate to PyO3 0.22 and released verion of rust-numpy crate (#19199)
- Make expression output type known (#19195)
- Revert "feat(python): Raise an error when users try to use Polars API in a fork()-without-execve() child (#19149) (#19188)
- Zero-Field Structs and DataFrame with Height Property (#19123)
- Make
pl.repeat
part of the IR (#19152)
- Expose IEJoin IR node to python (#19104)
- Clean remove_prefix since python3.9 is now the minimum Python (#19070)
- Add new streaming engine to CI (#19051)
Thank you to all our contributors for making this release possible!
@Bidek56, @MarcoGorelli, @Rashik-raj, @adamreeve, @alexander-beedie, @alonme, @balbok0, @coastalwhite, @deanm0000, @dependabot, @dependabot[bot], @eitsupi, @etrotta, @itamarst, @jbutterwick, @joelostblom, @kenkoooo, @khalidmammadov, @laurentS, @mcrumiller, @mscolnick, @nameexhaustion, @orlp, @pomo-mondreganto, @ritchie46, @rodrigogiraoserrao, @siddharth-vi, @stinodego, @sunadase and @wence-