Polars: py-0.19.4 Release

Release date:
September 27, 2023
Previous version:
py-0.19.3 (released September 15, 2023)
Magnitude:
11,733 Diff Delta
Contributors:
16 total committers
Data confidence:
Commits:

90 Commits in this Release

Ordered by the degree to which they evolved the repo in this version.

Authored September 17, 2023
Authored September 20, 2023
Authored September 26, 2023
Authored September 27, 2023
Authored September 18, 2023

Top Contributors in py-0.19.4

reswqa
ritchie46
alexander-beedie
MarcoGorelli
orlp
Fokko
stinodego
universalmind303
Cheukting
SeanTroyUWO

Directory Browser for py-0.19.4

We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.

Release Notes Published

πŸ† Highlights

  • support 'hive partitioning' aware readers (#11284)
  • natively support reading parquet for aws, gcp and azure (#11210)
  • Add support for Iceberg (#10375)
  • The great expressification by @reswqa (#11320, #11344, #11313, #11257, #11288, #11275, #11197, #11167, #11155)

⚠️ Deprecations

  • Add disable_string_cache (#11020)

πŸš€ Performance improvements

  • improve dynamic_groupby_iter (#11341)
  • improve and fix rolling windows by linear scanning (#11326)
  • faster init from pydantic models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263)
  • improve outer join materialization (#11241)
  • use ryu and itoa for primitive serialization (#11193)
  • use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
  • Using cache for str.contains regex compilation (#11183)

✨ Enhancements

  • introduce 'label' instead of 'truncate' in group_by_dynamic, which can take label='right' (#11337)
  • Expressify list.shift (#11320)
  • top_k and bottom_k supports pass an expr (#11344)
  • add "pyxlsb" engine support to read_excel (for excel binary workbook files) (#11248)
  • support 'hive partitioning' aware readers (#11284)
  • str.strip_chars supports take an expr argument (#11313)
  • sample n can take an expr (#11257)
  • Add disable_string_cache (#11020)
  • clip supports expr arguments and physical numeric dtype (#11288)
  • Introduce list.drop_nulls (#11272)
  • str.splitn and split_exact can take an expr argument by (#11275)
  • introduce ambiguous option for dt.round (#11269)
  • Adds NULLIF and COALESCE SQL functions (#11124)
  • better tree-formatting representation (#11176)
  • natively support reading parquet for aws, gcp and azure (#11210)
  • Expressify str.strip_prefix & suffix (#11197)
  • Add support for Iceberg (#10375)
  • list.join's separator can be expression (#11167)
  • argument every of datetime.truncate can be expression (#11155)

🐞 Bug fixes

  • Fix Series.__contains__ for None values and implement is_in for null Series (#11345)
  • don't panic on multi-nodes in streaming conversion (#11343)
  • ensure trailing quote is written for temporal data when CSV quote_style is non-numeric (#11328)
  • clarify has_validity docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence of null values (#11319)
  • fix empty Series construction edge-case with Struct dtype (#11301)
  • DataFrame init from collections.namedtuple values (#11314)
  • Exclude functools wrapper frames in find_stacklevel (#11292)
  • set partitions independent of thread pool (#11304)
  • address VSCode issue with autocomplete on selector expressions in editor/console (#11235)
  • consume duplicates in rolling_by window (#11261)
  • handle url encoded paths in objectpath creation (#11240)
  • use POOL when writing csv (#11222)
  • don't conflate saved Config JSON string with file path (#11098)
  • is_in for bool evaluate has_false incorrectly (#11217)
  • improve handling of database drivers that can return arrow data (#11201)
  • fix nullable filter mask in group_by (#11207)
  • replace n-th in filter (#11206)
  • fix translation of Series-nested datetime/date values for scan_pyarrow predicates (#11195)
  • address unexpected expression name from use of unary - or + operators (#11158)
  • impl hash for more function expr (#11182)
  • list.join's separator can be expression (#11167)
  • Add some missing expr type hint for series (#11171)
  • consistently use negative every as the default for offset in group_by_dynamic (#11164)
  • Make pl.struct serializable (#11169)
  • only raise on actual parameter collision when "dtypes" specified in read_excel "read_csv_options" (#11162)
  • propagate null value for str/binary starts/ends_with and contains (#11141)

πŸ› οΈ Other improvements

  • simplify/clarify group_by_dynamic examples (#11335)
  • tighten assert_frame_equal for LazyFrames (don't collect until after the schema has been checked) (#11331)
  • unify display for namespaced function expr (#11342)
  • add lazy pivot example (#11325)
  • Use GITHUB_TOKEN to get contributor information for docs (#11321)
  • Enable version warning banner (#11322)
  • cross-reference null_count from has_validity (clarifies the correct way to check for nulls) (#11323)
  • Pin pydantic in dev requirements <2.4.0 (#11312)
  • remove default auto-explode for map_many_private (#11270)
  • Add type alias IntoExprColumn (#11296)
  • update a few dependencies (#11283)
  • Properly skip ADBC test (#11282)
  • Fix some minor Makefile issues (#11276)
  • update sponsors (#11271)
  • parametric tests for group_by_rolling (#11262)
  • Make some list function expr non-anonymous (#11230)
  • Mention the performant feature only once (#11223)
  • remove unneeded indirection (#11233)
  • remove unneeded mutex around object-store (#11224)
  • clarify every/period/offset in group_by_dynamic (#11175)
  • Fix read_database batch_size docstring (#11132)

Thank you to all our contributors for making this release possible! @ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303