Articles tagged with "level-300"

Push-Down-Predicates in Parquet and how to use them to reduce IOPS while reading from S3

Working with datasets in pandas will almost inevitably bring you to the point where your dataset doesn’t fit into memory. Especially parquet is notorious for that since it’s so well compressed and tends to explode in size when read into a dataframe. Today we’ll explore ways to limit and filter the data you read using push-down-predicates. Additionally, we’ll see how you can do that efficiently with data stored in S3 and why using pure pyarrow can be several orders of magnitude more I/O-efficient than the plain pandas version.

The beating heart of SQS - of Heartbeats and Watchdogs

Using SQS as a queue to buffer tasks is probably the most common use case for the service. Things can get tricky if these tasks have a wide range of processing durations. Today, I will show you how to implement an SQS consumer that utilizes heartbeats to dynamically extend the visibility timeout to accommodate different processing durations.

Implementing Pessimistic Locking with DynamoDB and Python

I will show you how to implement pessimistic locking using Python with DynamoDB as our backend. Before we start, we’ll review the basics and discuss some of the design criteria we’re looking for. In an earlier post, I outlined to you how to implement optimistic locking using DynamoDB. There, I explained some of the reasons why locking is useful and which issues it can prevent. If you’re unfamiliar with the topic, I suggest you check that one out first.