Release Notes
2.4.2 -> 2.4.3 2025-September-20
- Fixes
- Fix for daily cleanup if daily was not executed before. Fixes:
[ERROR] MigrationProcessing Migration failed: no such table: sec_report_processing.
[ERROR] MigrationProcessing Please clear all data manually and run the process again…
2.4.1 -> 2.4.2 2025-August-08
- New
- Daily data processing is now integrated in the example automation pipeline. Have a look at 08_03_automation_supporting_daily_data_example_2.4.2 for details.
- Improvements in the process/task framework
- A new “LoggingProcess” provides a way to log some information before and after a process is executed to make the output more readable.
- The ConcatTasks can now take several root paths, not just one
- A Context can be passed between processes to share information.
2.4.0 -> 2.4.1 2025-July-26
- Fixes
- update to secdaily 0.2.2 (more robustness / prevent name clashes)
- check for daily data cleanup if needed by secdaily
2.3.0 -> 2.4.0 2025-July-15
- New (Experimental)
- Integration of secdaily to provide daily report updates.
- You have to turn on this feature by adding
dailyprocessing = True in the DEFAULT section of the configuration file.
- Please have a look at the notebook 10_00_daily_financial_report_updates for details.
2.2.0 -> 2.3.0 2025-May-05
Maintenance release.
- changed development environment to vscode as ide
- changed to use poetry as dependency management tool
- inform user if a newer version is available on pypi.org
2.1.0 -> 2.2.0 2025-March-07
- New
CIKXXFilter was introduced for RawDataBag and JoinedDataBag
- ciks_filter parameter was added to the load methods of
RawDataBag and JoinedDataBag
- The notebook 09_00_segments_basics gives an idea how you can work with the information in the segment column.
segment_basics
- The concat processes
ConcatByChangedTimestampProcess and ConcatByNewSubfoldersProcess now have a switch to choose whether in_memory or file_based concatenation should be used
ConcatByChangedTimestampProcess and ConcatByNewSubfoldersProcess now also support the concatenation of StandardizedBag
StandardizeProcess now also works with multiple subfolders where each contains BS, CF, and IS folders
- A new example of a memory optimized pipeline was introduced:
secfsdstools.x_examples.automation.memory_optimized_automation.define_extra_processes.
Have a look at the description of this pipeline in 08_02_automation_a_memory_optimized_example_2.2.0
- Changes
- The
is_xxx_bag_path methods in the module secfsdstools.d_container.databagmodel have been moved into RawDataBag, resp. JoinedDataBag classes in the same module.
The StandardizedBag now also has a is_xxx_bag_path method.
- Other
- GitHub sponsoring account was activated: https://github.com/sponsors/HansjoergW
- GitHub Discussions was activated: https://github.com/HansjoergW/sec-fincancial-statement-data-set/discussions
2.0.0 -> 2.1.0 2025-February-18
The main goal of this release was to improve the memory footprint when working with the framework.
These mainly includes the support for Predicate Pushdown in the load methods, as well
as being able to concatenate bags directly on the file system, which significantly improved the
memory footprint during concatenation steps when using “automation”.
Checkout the notebook: bulk_data_processing_memory_efficiency
- New
- Predicate Pushdown in
load methods of RawDataBag and JoinedDataBag
Directly apply filters for adshs, statements, forms, and tags during loading of the data
concat_filebased concatenates RawDataBag and JoinedDataBag folders without loading them into memory
ConcatByChangedTimestampProcess and ConcatByNewSubfoldersProcess use concat_filebased
save for RawDataBag, JoinedDataBag, and StandardizedBag create new the target folder if it does not exist.
1.8.2 -> 2.0.0 2025-February-11
Introducing the new version of the datasets that includes the “segments” column in then num tables.
The main purpose of this version is to ensure that the new “segments” colomn does not interfere with existing logic.
The following did change:
- Checks during starting if only data from the new datasets is present. If not, data have to be reloaded
- New NoSegmentInfo filter for raw and joined bags: removes datapoints with non-empty
segment info
StandardPresenter has a new show_segments flag. If True, datapoints with segments information are displayed as well
- Notebook 03_explore_with_interactive_notebook has new option
show_segments for displaying the details of a report
- Support for Daily-Datasets has been removed
1.8.1 -> 1.8.2 2025-January-20
- Ensures data is read only from the archived version of the datasets without the segments column in num.
1.8.0 -> 1.8.1 2025-January-12
- Fix problem with circular import when using the new FilterProcess module in secfsdstools.g_pipeline
1.7.0 -> 1.8.0 2025-January-10
1.6.2 -> 1.7.0 2024-December-22
- Fix for new path to zip files on SEC.gov
- The SEC did change the location of the zip files and this latest version fixes the path to them
1.6.1 -> 1.6.2 2024-September-15
- Major changes
- Compatibility for Python 3.7 is no longer checked
- Compatibility for Python 3.11 was added
- Minor changes
secfsdstools.__version__ now returns the version of the library
IncomeStatementStandardizer
- Calculation for
OutstandingShares and EarningsPerShare was simplified and improved
- Validation rule for
EarningsPerShare was added
- Please have a look at the comments in 07_02_IS_standardizer
- Ability to customize the standardizer was improved
- Configure the columns that are merged from sub_df into the final results can be extended
- Configure additional tags that should appear in the final result can be defined
- All constructor parameters of the
Standardizer base class can be overwritten via the constructor of the three standardizer classes
- New notebook that shows the different possibilities for customization: 07_04_customize_standardizer
1.6 -> 1.6.1 2024-August-20
- Minor improvements
filed column added to result of present method of standardizer
StandardizedBag now has a concat() method to concat multiple instances into one
Standardizer checks that the data contains just one currency
IncomeStatementStandardizer now also returns OustandingShares and EarningsPerShare tags
- 03_explore_with_interactive_notebook.ipynb includes use of the
CashFlowStandardizer
- improvements in the READMD.md -> thanks to Hamid Ebadi
- Documentation
1.6.0 2024-July-12
- New
- Introducing Cash Flow Standardizer
The Cash Flow Standardizer makes the cash flow statements easily comparable.
07_03_CF_standardizer
- Improvements
- Small improvements in the Standardizer framework and rules
1.5.0 2024-May-18_
- New
- Introducing Income Statement Standardizer
The Income Statement Standardizer makes the income statements easily comparable.
07_02_IS_standardizer
- Improvements
- Small improvements in the Standardizer framework and rules
1.4.2 2024-Mar-29
- Fix
- The StandardStatementPresenter didn’t consider
qtrs when displaying the data. This was a problem for the
Income Statement and the Cash Flow.
- Improvements
- Several in the
Standardizer as preparation to implement the Income Statement and Cash Flow Standardizer.
1.4.0 2024-Feb-02
- New
- Introducing the Standardizer Framework and the Balance Sheet Standardizer as a first implementation.
The Balance Sheet Standardizer makes the balance sheets easily comparable.
Check out the following notebooks:
07_00_standardizer_basics
07_01_BS_standardizer
- Improvements
- Efficiency improvements for
MultiReportCollector: Every zip file is opened just once if there are multiple reports
to load from the same zip file.
1.3.0 2023-Dec-28
- New
- Notebook 06_bulk_data_processing_deep_dive
This first version shows how datasets can be created with data from all available zip files. It shows a faster
parallel approach which uses more memory and cpu resources and a slower serial approach which uses significant
less resources.
- Package
u_usecases introduced.
This package is a place to provide concrete examples showing what you can do
with the secfsdstools library. As a first usecase, the logic shown and explained in the 06_bulk_data_processing_deep_dive
is provided as logic within the modul bulk_loading.
1.2.0 2023-Dec-02
- API Changes
MainCoregFilter was renamed to MainCoregRawFilter
OfficialTagsOnlyFilter was renamed to OfficialTagsOnlyRawFilter
- New
secfsdstools.e_filter.rawfiltering.USDOnlyRawFilter is new and removes none USD currency datapoints
- All filters have been implemented for the JoinedDataBag as well:
secfsdstools.e_filter.joinedfiltering
- Notebook 05_filter_deep_dive notebook.
1.1.0 2023-Oct-28
- API Changes
- Zipcollector has now a factory method that can load multiple zip files as one
- Zipcollector has now a factory method that can load all zip files at one
- Zipcollector factory methods have a new filter parameter “post_load_filter”
- New
- Filter for official tags only -> company specific tags are removed
- RawDataBag and JoinedDataBag have now copy_bag method
- Notebook 04_collector_deep_dive
1.0.1 2023-Oct-16
- README.md adpated
Added information about using the library on windows because the multiprocessing package is used
https://docs.python.org/3.10/library/multiprocessing.html#the-process-class
1.0.0 2023-Sep-28
ApiChanges:
- The API has completely changed, it should be more structured now.
Please check out the README.md and the 01_quickstart notebook for details
0.5.0 2023-Jun-02
- use parquet as storing format instead of zipfiles with csv files -> 5-10x faster access to data
- auto discover of new zip files on sec.gov
- launch first time download of zip files without calling the update method
ApiChanges:
- package secfsdstools.d_index was renamed into secfsdstools.c_index
0.4.0 2023-Mar-25
- new MultiReportReader - reads reports from different zipfiles at once
- new CompanyCollector - reads reports for one company from different zipfiles at once
- new merge_pre_and_num() method which only merges the pre and num data but does not pivot it
- new notebook that shows how the data can be analyzed with an interactive jupyter notebook
BugFixes:
- coreg was not considered correctly when merging the data
0.3.0 2023-Feb-04
- integration of https://rapidapi.com/hansjoerg.wingeier/api/daily-sec-financial-statement-dataset. Daily updates instead of quarterly updates.
0.2.1 2023-Jan-21
- class ZipReportReader: helps to read data from a whole zip file; has the same interface as report reader
- class IndexSearch: helps with searching the index_report table
- added a getting started notebook: https://nbviewer.org/github/HansjoergW/sec-fincancial-statement-data-set/blob/main/notebooks/01_quickstart.ipynb
- ensure runs also with python 3.7
- improvements in the API documentation
0.2.0 2023-Jan-14
- first simple APi docu on githubpages https://hansjoergw.github.io/sec-fincancial-statement-data-set/secfsdstools/
- renaming of internal package structure
0.1.3 2023-Jan-08
- dependencies added into pyproject.toml
0.1.1 2023-Jan-07