Polars: py-1.6.0 Release

Release date:
August 29, 2024
Previous version:
py-1.5.0 (released August 14, 2024)
Magnitude:
7,352 Diff Delta
Contributors:
26 total committers
Data confidence:
Commits:

101 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored August 27, 2024
Authored August 28, 2024
Authored August 15, 2024

Top Contributors in py-1.6.0

orlp
ritchie46
MarcoGorelli
coastalwhite
r-brink
nameexhaustion
alexander-beedie
henryharbeck
mcrumiller
stinodego

Directory Browser for py-1.6.0

All files are compared to previous version, py-1.5.0. Click here to browse diffs between other versions.

Loading File Browser...

Release Notes Published

πŸ’₯ Unstable Breaking changes

These API's were marked unstable and are allowed to change.

  • Use Altair in DataFrame.plot (#17995)

πŸš€ Performance improvements

  • Parquet do not copy uncompressed pages (#18441)
  • Several large parquet optimizations (#18437)
  • Batch Plain Parquet UTF-8 verification (#18397)
  • Partition metadata for parquet statistic loading (#18343)
  • Fix accidental quadratic parquet metadata (#18327)
  • Lazy decompress Parquet pages (#18326)
  • Don't rechunk aligned chunks in owned_binary_chunk_align (#18314)
  • Batch DELTA_LENGTH_BYTE_ARRAY decoding (#18299)
  • Slice pushdown for SimpleProjection (#18296)
  • Use direct path for time/timedelta literals (#18223)
  • Speedup ndjson reader ~40% (#18197)
  • Skip parquet page when unneeded (#18192)

✨ Enhancements

  • Use Altair in DataFrame.plot (#17995)
  • Allow mapping as syntactic sugar in str.replace_many (#18214)
  • Respect input time zone if input is pandas Timestamp (#18346)
  • Improve Schema and DataType interop with Python types (#18308)
  • Add POLARS_BACKTRACE_IN_ERR for debugging (#18333)
  • IR serde (#18298)
  • Improve decimal_comma error message (#18269)
  • Support pre-signed URLs for cloud scan (#18274)
  • Support the most recent version of "duckdb_engine" connections via read_database (#18277)
  • Support empty structs (#18249)
  • Allow float in interpolate_by by column (#18015)
  • Make show_versions more responsive (#18208)

🐞 Bug fixes

  • Enable CSE in eager if struct are expanded (#18426)
  • Treat explode as gather (#18431)
  • Parquet nested values that span several pages (#18407)
  • Support reading empty parquet files (#18392)
  • Recurse on map field during type conversion (#15075)
  • Allow search_sorted on boolean series (#18387)
  • Mark Expr.(lower|upper)_bound as returning scalar (#18383)
  • Fix compressed ndjson row count (#18371)
  • Use correct column names when there are no value columns in unpivot (#18340)
  • Parquet several smaller issues (#18325)
  • Fix group-by slice on all keys (#18324)
  • Compute joint null mask before calling rolling corr/cov stats (#18246)
  • Several scan_parquet(parallel='prefiltered') problems (#18278)
  • Json feature flag missing imports (#18305)
  • Check groups in group-by filter (#18300)
  • Parquet delta encoding for 0-bitwidth miniblocks (#18289)
  • Arguments for upsample only have to be sorted within groups (#18264)
  • Use appropriate bins in hist when bin_count specified (#16942)
  • Raise suitable error on unsupported SQL set op syntax (#18205)
  • Fix invalid state due to cached IR (#18262)
  • Fix failed AWS credential load from '~/.aws/credentials' due to formatting (#18259)
  • Fix panic streaming parquet scan from cloud with slice (#18202)
  • Consistently round half-way points down in dt.round (#18245)
  • Fix duplicate column output and panic for include_file_paths (#18255)
  • Fix unit null rank (#18252)
  • Use physical for row-encoding (#18251)
  • Convert date and datetime in literal construction (#16018)
  • Fix gather str as lit (#18207)

πŸ“– Documentation

  • Add date_range and datetime_ranges examples without eager=True (#18379)
  • Fix incorrect comments in group_by_dynamic (#18415)
  • Alphabetise methods in Python API reference (#18380)
  • Document POLARS_BACKTRACE_IN_ERR env var (#18354)
  • Add missing aggregation entries (#18334) (#18341)
  • Add missing Series methods to API reference (#18312)
  • Document DataFrame.__getitem__ and Series.__getitem__ (#18309)
  • Fix typos and add see also links to struct name expressions (#18282)
  • Improve decimal_comma error message (#18269)
  • Clarify coalesce behaviour in join_asof (#18273)
  • Add note to Expr.shuffle differentiating from df method (#18266)
  • Improve formatting and consistency of various docstrings (#18237)
  • Add missing "Parameters" section to bin.size expr docstring (#18222)
  • Fix column name output in example of DataFrame.map_rows (#18227)

πŸ“¦ Build system

  • Bump Rust toolchain to nightly-2024-08-26 (#18370)

πŸ› οΈ Other improvements

  • Address spurious hypothesis test failure (#18434)
  • Turn all Binary/Utf8 into BinaryView/Utf8View in Parquet (#18331)
  • Fix the required version of rust in README.md (#18357)
  • Remove unused Parquet indexes (#18329)
  • Deprecate serialize json for LazyFrame (#18283)
  • Don't add sink node to cloud query (#18280)
  • Split py-polars crate (#18204)
  • Fix test for new deltalake release (#18211)
  • Update the required version of rust in README.md (#18203)
  • Fix version bifurcation for test_read_database_cx_credentials (#18220)
  • Use or_else for raising (#18206)
  • Remove unused Parquet source files (#18193)

Thank you to all our contributors for making this release possible! @BartSchuurmans, @ChayimFriedman2, @MarcoGorelli, @StepfenShawn, @agossard, @alexander-beedie, @cgbur, @coastalwhite, @corwinjoy, @deanm0000, @henryharbeck, @ion-elgreco, @jqnatividad, @krasnobaev, @liufeimath, @markxwang, @mcrumiller, @nameexhaustion, @orlp, @ritchie46, @stinodego, @sunadase, @thomascamminady and @wence-