This Week in Databend #93
May 14, 2023 · 3 min read
Stay up to date with the latest weekly developments on Databend!
Databend is a modern cloud data warehouse, serving your massive-scale analytics needs at low cost and complexity. Open source alternative to Snowflake. Also available in the cloud: https://app.databend.com .
The upgrade tool
meta-upgrade-09will no longer be available in the release package. If you're using Databend 0.9 or an earlier version, you can seek help from the community.
What's On In Databend
Stay connected with the latest news about Databend.
Databend's Segment Caching Mechanism Now Boasts Improved Memory Usage
Databend's segment caching mechanism has received a significant upgrade that reduces its memory usage to 1.5/1000 of the previous usage in a test scenario.
The upgrade involves a different "representation" of cached segments, called
CompactSegmentInfo. This presentation consists mainly of two components:
- The decoded min/max indexes and other statistical information.
- The undecoded (and compressed) raw bytes of block-metas.
During segment pruning, if any segments are pruned, there is no need to decode the block-metas represented by raw bytes. If they are not pruned, then their raw bytes are decoded on-the-fly for block pruning and scanning purposes (and dropped if no longer needed).
If you are interested in learning more, please check out the resources listed below.
Discover some fascinating code snippets or projects that showcase our work or learning journey.
databend into Python
Databend now offers a Python binding that allows users to execute SQL queries against Databend using Python even without deploying a Databend instance.
To use this functionality, simply import
databend module and create an instance of it:
from databend import SessionContext
ctx = SessionContext()
You can then run SQL queries using the
sql() method on your session context object:
df = ctx.sql("select number, number + 1, number::String as number_p_1 from numbers(8)")
The resulting DataFrame can be converted to PyArrow or Pandas format using the
to_pandas() methods respectively:
df.to_pandas() # Or, df.to_py_arrow()
Feel free to integrate it with your data science workflow.
Here are some noteworthy items recorded here, perhaps you can find something that interests you.
- Read the two new tutorials added to Transform Data During Load to learn how to perform arithmetic operations during loading and load data into a table with additional columns.
- Read Working with Stages to gain a deeper understanding and learn how to manage and use it effectively.
- Added functions:
What's Up Next
We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.
open-sharing Binary to Databend Image
Open Sharing is a cheap and secure data sharing protocol for databend query on multi-cloud environments. Databend provides a binary called
open-sharing, which is a tenant-level sharing endpoint. You can read databend | sharing-endpoint - README.md to learn more information.
To facilitate the deployment of
open-sharing endpoint instances using K8s or Docker, it is recommended to add it to Databend's docker image.
Please let us know if you're interested in contributing to this issue, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.
We always open arms to everyone and can't wait to see how you'll help our community grow and thrive.
- @Mehrbod2002 made their first contribution in #11367. Added validation for
- @DongHaowen made their first contribution in #11362. Specified database in benchmark.
You can check the changelog of Databend Nightly for details about our latest developments.
🎉 Contributors 24 contributors
Thanks a lot to the contributors for their excellent work.
🎈Connect With Us
Databend is a cutting-edge, open-source cloud-native warehouse built with Rust, designed to handle massive-scale analytics.
Join the Databend Community to try, get help, and contribute!