Extremely fast Query Engine for DataFrames, written in Rust
0
Polars py-1.0.0-rc.2
Release date:
June 24, 2024
Previous version:
py-1.0.0-rc.1
(released June 23, 2024)
Magnitude:
2,196
Diff Delta
Contributors:
10 total committers
Commits:
18 Commits in this Release
Ordered by the degree to which they evolved the repo in this version.
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 23, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 24, 2024
Authored June 23, 2024
Browse Other Releases
Latest Pending
Unreleased π
py-1.3.0
Released July 28, 2024
5,962 Ξ
py-1.2.1
Released July 18, 2024
934 Ξ
py-1.2.0
Released July 16, 2024
7,039 Ξ
py-1.1.0
Released July 7, 2024
2,459 Ξ
py-1.0.0
Released July 1, 2024
7,698 Ξ
py-1.0.0-rc.2
Released June 24, 2024
2,196 Ξ
py-1.0.0-rc.1
Released June 23, 2024
4,410 Ξ
py-1.0.0-beta.1
Released June 17, 2024
0 Ξ
py-1.0.0-alpha.1
Released June 11, 2024
11,896 Ξ
py-0.20.31
Released June 1, 2024
0 Ξ
Top Contributors in py-1.0.0-rc.2
coastalwhite
alexander-beedie
adamreeve
ritchie46
datapythonista
orlp
lukeshingles
eitsupi
nameexhaustion
mcrumiller
Directory Browser for py-1.0.0-rc.2
We haven't yet finished calculating and confirming the files and directories changed in this release. Please check back soon.
Release Notes Published
π₯ Breaking changes
- Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Split
replacefunctionality into two separate functions (#16921) - Default to writing binview data to IPC (#17084)
- Remove re-export of type aliases (#17032)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Properly apply
strictparameter in Series constructor (#16939) - Remove supertype definition of List and non-List types (#16918)
- Consistently convert to given time zone in Series constructor (#16828)
- Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Set
infer_schema_lengthas keyword-only argument instr.json_decode(#16835) - Update
set_sortedto only accept a single column (#16800) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Update some error types to more appropriate variants (#15030)
- Scheduled removal of deprecated functionality (#16715)
- Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610)
β οΈ Deprecations
- Deprecate
sizeparameter in parametric testing strategies in favor ofmin_size/max_size(#17128) - Split
replacefunctionality into two separate functions (#16921) - Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - Remove re-export of exceptions at top-level (#17059)
- Deprecate
dt.mean/dt.medianin favor ofmean/median(#16888) - Deprecate
LazyFrame.with_contextin favor of horizontal concatenation (#16860) - Rename parameter
descendingtoreverseintop_kmethods (#16817) - Rename
str.concattostr.joinand update default delimiter (#16790) - Deprecate
arctan2din favor ofarctan2(...).degrees()(#16786)
π Performance improvements
- create UniqueKernel and improve bool implementation (#17160)
- parallel linearize in new streaming engine (#17050)
- Default to writing binview data to IPC (#17084)
- Parallelize arrow conversion if binview -> large_bin (#17083)
- GC buffers in if_then_else view kernel (#16993)
- Desugar
ANDfilter into multiple nodes (#16992) - Optimize generic argsort of row-encoding (#16894)
- Improve rle_id iteration perf and set sorted flags (#16893)
- Optimize string/binary sort (#16871)
- Use
split_atinsplit(#16865) - Use
split_atinstead of double slice in chunk splits. (#16856) - Don't rechunk in
align_if arrays are aligned (#16850) - Don't create small chunks in parallel collect. (#16845)
- Add dedicated no-null branch in
arg_sort(#16808) - Speed up
dt.offset_by2x for constant durations (#16728) - Toggle coalesce in
joinif non-coalesced key isn't projected (#16677) - Make
dt.truncate1.5x faster wheneveryis just a single duration (and not an expression) (#16666) - Always prune unused columns in semi/anti join (#16665)
β¨ Enhancements
- Support reading byte stream split encoded floats and doubles in parquet (#17099)
- Add
float_scientificoption towrite_csv/sink_csv(#17111) - Support
Structfield selection in the SQL engine,RENAMEandREPLACEselect wildcard options (#17109) - Update
DataFrame.pivotto allowindex=Nonewhenvaluesis set (#17126) - Make
hive_partitioningparameter default toNone, which is automatically enabled for single directory inputs, and disabled otherwise (#17106) - Improve ipython autocomplete for LazyFrame and DataFrame (#17091)
- Split
replacefunctionality into two separate functions (#16921) - Improve schema inference for hive partitions (#17079)
- Rename
DataFrame.melttounpivotand make parameters consistent withpivot(#17095) - print row index in explain + dot (#17074)
- Support top-level
pl.colautocompletion for iPython (#17080) - Remove re-export of exceptions at top-level (#17059)
- predicate + projection pushdown in NDJson (#17068)
- Allow (non-)coalescing in join_asof (#17066)
- Turn of coalescing and fix mutation of join on expressions (#17061)
- Expand NDJson glob into one SCAN (#17063)
- Do not parse hive partitions from user provided base directory path (#17055)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Implement general array equality checks (#17043)
- Add
strictparameter toDataFrame/LazyFrame.dropand fix behavior to default to True (#17044) - Rename
ModuleUpgradeRequiredandPolarsPanicErrorerror, removeInvalidAsserterror (#17033) - Add
rechunkparameter toread_delta(#16991) - allow experimental metadata use on release (#17005)
- first working prototype of new streaming engine (#16970)
- Add simple version of
json_normalize(#17015) - Change data orientation inference logic for DataFrame construction and warn when row orientation is inferred (#16976)
- Desugar
ANDfilter into multiple nodes (#16992) - Handle textio even if not correct (#16971)
- Properly apply
strictparameter in Series constructor (#16939) - Add SQL support for
INTERSECTandEXCEPTops (#16960) - Add
PerformanceWarningto LazyFrame properties (#16964) - Add
collect_schemamethod toLazyFrameandDataFrame(#16929) - Allow setting file cache TTL on a per-file basis (#16891)
- Support Decimal inputs for
lit(#16950) - Implement multiply and division for lhs duration (#16948)
- Raise on invalid temporal arithmetic (#16934)
- Always end with a in-memory sink on collect (#16928)
- add style namespace (which defers to Great Tables) (#16809)
- Add
Schemaclass (#16873) - Normalize
value_counts(#16917) - add
eq/nefor moreFixedSizeLists (#16902) - setup skeleton (#16900)
- add fundamentals for new async-based streaming execution engine (#16884)
- Cache downloaded cloud IPC files (#16892)
- Consistently convert to given time zone in Series constructor (#16828)
- Improve
read_csvSQL table reading function defaults (better handle dates) (#16866) - Support SQL
VALUESclause and inline renaming of columns in CTE & derived table definitions (#16851) - Support Python
Enumvalues inlit(#16858) - convert to give time zone in
.str.to_datetimewhen values are offset-aware (#16742) - Update
reshapeto return Array types instead of List types (#16825) - Default to raising on out-of-bounds indices in all
get/gatheroperations (#16841) - Support
SQL"SELECT" with no tables, optimise registration of globals (#16836) - Native
selectorXOR set operation, guarantee consistent selector column-order (#16833) - Extend recognised
EXTRACTandDATE_PARTSQL part abbreviations (#16767) - Improve error message when raising integers to negative integers, improve docs (#16827)
- Return datetime for mean/median of Date colum (#16795)
- Update
set_sortedto only accept a single column (#16800) - Expose overflowing cast (#16805)
- Update
group_byiteration andpartition_byto always return tuple keys (#16793) - Support array arithmetic for equally sized shapes (#16791)
- Expedited removal of certain deprecated functionality (2) (#16779)
- Removal of
read_database_uripassthrough fromread_database(#16783) - Remove
pyxlsbengine fromread_database(#16784) - Add
check_orderparameter toassert_series_equal(#16778) - Enforce deprecation of keyword arguments as positional (#16755)
- Support cloud storage in
scan_csv(#16674) - Streamline SQL
INTERVALhandling and improve related error messages, updatesqlparser-rslib (#16744) - Support use of ordinal values in SQL
ORDER BYclause (#16745) - Support executing polars SQL against
pandasandpyarrowobjects (#16746) - Remove deprecated parameters in
Series.cut/qcutand update struct field names (#16741) - Expedited removal of certain deprecated functionality (#16754)
- Remove deprecated functionality from rolling methods (#16750)
- Update
date_rangeto no longer produce datetime ranges (#16734) - Mark
min_periodsas keyword-only forrollingmethods (#16738) - Remove deprecated
top_kparametersnulls_last,maintain_order, andmultithreaded(#16599) - Support order-by in window functions (#16743)
- Add SQL support for
NULLS FIRST/LASTordering (#16711) - Update some error types to more appropriate variants (#15030)
- Initial SQL support for
INTERVALstrings (#16732) - Scheduled removal of deprecated functionality (2) (#16724)
- Scheduled removal of deprecated functionality (#16715)
- Enforce deprecation of
offsetarg intruncateandround(#16655) - Change default
offsetingroup_by_dynamicfrom 'negativeevery' to 'zero' (#16658) - Constrain access to globals from
DataFrame.sqlin favor of top-levelpl.sql(#16598) - Read 2D NumPy arrays as
Arraytype instead ofList(#16710) - Update
clipto no longer propagate nulls in the given bounds (#14413) - Change
str.to_datetimeto default to microsecond precision for format specifiers"%f"and"%.f"(#13597) - Update resulting column names in
pivotwhen pivoting by multiple values (#16439) - Preserve nulls in
ewm_mean,ewm_std, andewm_var(#15503) - Restrict casting for temporal data types (#14142)
- Add many more auto-inferable datetime formats for
str.to_datetime(#16634) - Support Decimal types by default when converting from Arrow (#15324)
- Remove serde functionality from
pl.read_jsonandDataFrame.write_json(#16550) - Update function signature of
nthto allow positional input of indices, removecolumnsparameter (#16510) - Rename struct fields of
rleoutput tolen/valueand update data type oflenfield (#15249) - Remove class variables from some DataTypes (#16524)
- Add
check_namesparameter toSeries.equalsand default toFalse(#16610) - Dedicated
SQLInterfaceandSQLSyntaxerrors (#16635) - Add
DIVfunction support to the SQL interface (#16678) - Support non-coalescing streaming left join (#16672)
- Allow wildcard and exclude before struct expansions (#16671)
π Bug fixes
- Use explicit turbofish to help rustc (#17159)
- Raise on invalid set dtypes (#17157)
- Fix corrupted reads for hive parts from cloud and projection pushdown failure on hive parts (#17152)
- Set intersection supertype (#17154)
ChainedWhenshould not inheritExpr(#17142)- Fix decompress_impl for csv with n_rows set (#17118)
- adds "polars-ops/timezones" dependency for "timezones" feature (#17115)
- Fix incorrect window std for chunked series (#17110)
- make
GetOutput::get_fieldfallible (#17114) - Fix melt panic (#17088)
- Fix expression autocomplete in ipython (#17072)
- Exclude index from expansion in rolling/group_by_dynamic (#17086)
- Update some
Seriesdunder method type signatures (#17053) - Fix oob of join with literals and empty table (#17047)
- Don't silently accept multi-table FROM clauses (implicit JOIN syntax) (#17028)
- Don't split up ANDed filters that are group-aware (#17031)
- Harden "async" check for users with out-of-date
sqlalchemylibraries (#17029) - error when sort_by of unequal length (#17026)
- properly catch not found explode cols (#17020)
- Correctly convert data frames to NumPy for C index order (#17000)
- Raise on invalid arithmetic shapes (#16986)
- Don't pushdown predicates in cross join if the refer to both tables (#16983)
- Fix projection pushdown with literal joins (#16981)
- Fix edge case in DataFrame constructor data orientation inference (#16975)
- Raise on list of objects (#16959)
- Handle strictness for Decimal Series construction (#15309)
- Don't panic in object to anyvalue (#16957)
- properly set
FAST_EXPLODE_LISTmetadata (#16951) - Raise informative error when writing object to file (#16954)
- Remove supertype definition of List and non-List types (#16918)
- Remove unwrap in
extend()(#16890) - Fix
should_rechunkcheck (#16852) - Ensure
read_excelandread_odsreturn identical frames across all engines when given empty spreadsheet tables (#16802) - Consistent behaviour when "infer_schema_length=0" for
read_excel(#16840) - Standardised additional SQL interface errors (#16829)
- Ensure that splitted ChunkedArray also flattens chunks (#16837)
- Reduce needless panics in comparisons (#16831)
- Reset if next caller clones inner series (#16812)
- Raise on non-positive json schema inference (#16770)
- Rewrite implementation of
top_k/bottom_kand fix a variety of bugs (#16804) - Fix comparison of UInt64 with zero (#16799)
- Fix incorrect parquet statistics written for UInt64 values > Int64::MAX (#16766)
- Fix boolean distinct (#16765)
DATE_PARTSQL syntax/parsing, improve some error messages (#16761)- Include
pl.qualifier for inner dtypes into_init_repr(#16235) - Column selection wasn't applied when reading CSV with no rows (#16739)
- Panic on empty df / null List(Categorical) (#16730)
- Only flush if operator can flush in streaming outer join (#16723)
- Raise unsupported cat array (#16717)
- Assert SQLInterfaceError is raised (#16713)
- Restrict casting for temporal data types (#14142)
- Handle nested categoricals in
assert_series_equalwhencategorical_as_str=True(#16700) - Improve
read_databasecheck for SQLAlchemy async Session objects (#16680) - Reduce scope of multi-threaded numpy conversion (#16686)
- Full null on dyn int (#16679)
- Fix filter shape on empty null (#16670)
π Documentation
- Add doc examples to
concat_list(#17127) - Add "coming from pandas" note to
DataFrame.uniquedocstring (#17119) - Fix some warnings during doc build (#17077)
- Properly expose
InProcessQueryin docs, mark as unstable (#17097) - Add upgrade guide for Python Polars 1.0.0 (#16914)
- Lots of additions to the SQL reference docs (#16990)
- Minor doctest fixes (#17002)
- Include a doc entry for every exception type (#17001)
- fixup bullet points in write_parquet (#16909)
- Update version switcher for 1.0.0 prereleases (#16847)
- Update link from Python API reference to user guide (#16849)
- Update docstring/test/etc usage of
selectandwith_columnsto idiomatic form (#16801) - Update versioning docs for 1.0.0 (#16757)
- Add docstring example for
DataFrame.limit(#16753) - Fix incorrect stated value of
include_nullsinDataFrame.updatedocstring (#16701) - Update deprecation docs in the user guide (#14315)
- Add example for index count in
DataFrame.rolling(#16600) - Improve docstring of
Expr/Series.map_elements(#16079) - Add missing
polars.sqldocs entry and small docstring update (#16656)
π¦ Build system
- Do not change environment on import (#17101)
- Fix config flag for Tracemalloc (#17098)
- Pin optional NumPy dependency to
< 2.0.0for now (#17060)
π οΈ Other improvements
- Add missing spaces in
cargo.toml(#17145) - Update rustc 2024-06-23 (#17135)
- Minor test refactor for
concat_list(#17120) - Remove re-export of data type groups (#17073)
- Add pivot test #17081 (#17090)
- Minor cleanup to better define boundaries of public API (#17051)
- Support directory paths in scans for Parquet, IPC and CSV (#17017)
- Remove re-export of type aliases (#17032)
- Remove file cache test (#17038)
- Update exception imports in test suite (#17035)
- Point polars-stream to crates/ again (#17024)
- Fix failing file cache test in CI (#17014)
- Add some parametric tests for sort functionality (#17008)
- Pin NumPy to <2.0 for now (#16999)
- Use proper join type in test (#16994)
- Fix file cache verbose logging leakage during pytest (#16984)
- Skip another intermitently failing AWS test (#16980)
- Update test suite to explicitly use
orient="row"in DataFrame constructor when applicable (#16977) - Remove redundant projection attribute in IR::DataFrameScan (#16952)
- Factor out some apply calls in duration namespace (#16941)
- extend new streaming engine with some initial nodes (#16940)
- Skip intermittently failing AWS test (#16908)
- Refactor expression parsing utils (#16906)
- setup skeleton (#16900)
- Refactor parts of IR. (#16899)
- Move around some existing tests (#16877)
- Remove inner
ArcfromFileCacheEntry(#16870) - Do not update stable API reference on prerelease (#16846)
- Update links to API references (#16843)
- Prepare update of API reference URLs (#16816)
- Rename allow_overflow to wrap_numerical (#16807)
- Set
infer_schema_lengthas keyword-only argument instr.json_decode(#16835) - Don't enter streaming engine for groupby-> agg mean/median β¦ (#16810)
- Improve safety of amortized_iter (#16820)
- Remove needless inner type clone (#16718)
- Fix incorrect debug assertion in
ChunkedArray::from_chunks_and_dtype(#16697) - Update version resolver for
1.0.0release (#16705) - Avoid AWS pinning to outdated crc32c version (#16681)
Thank you to all our contributors for making this release possible! @JulianCologne, @KDruzhkin, @Kylea650, @MarcoGorelli, @Mottl, @Object905, @adamreeve, @alexander-beedie, @bertiewooster, @borchero, @c-peters, @coastalwhite, @datapythonista, @datenzauberai, @dependabot, @dependabot[bot], @eitsupi, @henryharbeck, @itamarst, @lukeshingles, @machow, @marenwestermann, @mcrumiller, @montanarograziano, @nameexhaustion, @orlp, @p3i0t, @ritchie46, @sherlockbeard, @stinodego, @tkellogg, @universalmind303 and @wence-
