π₯ Unstable Breaking changes
These API's were marked unstable and are allowed to change.
- Use Altair in DataFrame.plot (#17995)
π Performance improvements
- Parquet do not copy uncompressed pages (#18441)
- Several large parquet optimizations (#18437)
- Batch Plain Parquet UTF-8 verification (#18397)
- Partition metadata for parquet statistic loading (#18343)
- Fix accidental quadratic parquet metadata (#18327)
- Lazy decompress Parquet pages (#18326)
- Don't rechunk aligned chunks in owned_binary_chunk_align (#18314)
- Batch
DELTA_LENGTH_BYTE_ARRAY
decoding (#18299)
- Slice pushdown for SimpleProjection (#18296)
- Use direct path for
time
/timedelta
literals (#18223)
- Speedup ndjson reader
~40%
(#18197)
- Skip parquet page when unneeded (#18192)
β¨ Enhancements
- Use Altair in DataFrame.plot (#17995)
- Allow mapping as syntactic sugar in
str.replace_many
(#18214)
- Respect input time zone if input is pandas Timestamp (#18346)
- Improve Schema and DataType interop with Python types (#18308)
- Add POLARS_BACKTRACE_IN_ERR for debugging (#18333)
- IR serde (#18298)
- Improve decimal_comma error message (#18269)
- Support pre-signed URLs for cloud scan (#18274)
- Support the most recent version of "duckdb_engine" connections via
read_database
(#18277)
- Support empty structs (#18249)
- Allow float in interpolate_by by column (#18015)
- Make show_versions more responsive (#18208)
π Bug fixes
- Enable CSE in eager if struct are expanded (#18426)
- Treat
explode
as gather
(#18431)
- Parquet nested values that span several pages (#18407)
- Support reading empty parquet files (#18392)
- Recurse on map field during type conversion (#15075)
- Allow search_sorted on boolean series (#18387)
- Mark Expr.(lower|upper)_bound as returning scalar (#18383)
- Fix compressed ndjson row count (#18371)
- Use correct column names when there are no value columns in unpivot (#18340)
- Parquet several smaller issues (#18325)
- Fix group-by slice on all keys (#18324)
- Compute joint null mask before calling rolling corr/cov stats (#18246)
- Several
scan_parquet(parallel='prefiltered')
problems (#18278)
- Json feature flag missing imports (#18305)
- Check groups in group-by filter (#18300)
- Parquet delta encoding for 0-bitwidth miniblocks (#18289)
- Arguments for
upsample
only have to be sorted within groups (#18264)
- Use appropriate bins in
hist
when bin_count
specified (#16942)
- Raise suitable error on unsupported
SQL
set op syntax (#18205)
- Fix invalid state due to cached IR (#18262)
- Fix failed AWS credential load from '~/.aws/credentials' due to formatting (#18259)
- Fix panic streaming parquet scan from cloud with slice (#18202)
- Consistently round half-way points down in dt.round (#18245)
- Fix duplicate column output and panic for
include_file_paths
(#18255)
- Fix unit null rank (#18252)
- Use physical for row-encoding (#18251)
- Convert date and datetime in literal construction (#16018)
- Fix gather str as lit (#18207)
π Documentation
- Add date_range and datetime_ranges examples without
eager=True
(#18379)
- Fix incorrect comments in
group_by_dynamic
(#18415)
- Alphabetise methods in Python API reference (#18380)
- Document POLARS_BACKTRACE_IN_ERR env var (#18354)
- Add missing aggregation entries (#18334) (#18341)
- Add missing
Series
methods to API reference (#18312)
- Document
DataFrame.__getitem__
and Series.__getitem__
(#18309)
- Fix typos and add see also links to struct name expressions (#18282)
- Improve decimal_comma error message (#18269)
- Clarify
coalesce
behaviour in join_asof
(#18273)
- Add note to
Expr.shuffle
differentiating from df method (#18266)
- Improve formatting and consistency of various docstrings (#18237)
- Add missing "Parameters" section to
bin.size
expr docstring (#18222)
- Fix column name output in example of
DataFrame.map_rows
(#18227)
π¦ Build system
- Bump Rust toolchain to
nightly-2024-08-26
(#18370)
π οΈ Other improvements
- Address spurious hypothesis test failure (#18434)
- Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet (#18331)
- Fix the required version of rust in README.md (#18357)
- Remove unused Parquet indexes (#18329)
- Deprecate serialize json for LazyFrame (#18283)
- Don't add sink node to cloud query (#18280)
- Split
py-polars
crate (#18204)
- Fix test for new deltalake release (#18211)
- Update the required version of rust in README.md (#18203)
- Fix version bifurcation for
test_read_database_cx_credentials
(#18220)
- Use or_else for raising (#18206)
- Remove unused Parquet source files (#18193)
Thank you to all our contributors for making this release possible!
@BartSchuurmans, @ChayimFriedman2, @MarcoGorelli, @StepfenShawn, @agossard, @alexander-beedie, @cgbur, @coastalwhite, @corwinjoy, @deanm0000, @henryharbeck, @ion-elgreco, @jqnatividad, @krasnobaev, @liufeimath, @markxwang, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stinodego, @sunadase, @thomascamminady and @wence-